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Abstract 



In this paper, we study distributed consensus in synchronous systems subject to both unexpected 
G crash failures and strategic manipulations by rational agents in the system. We adapt the concept of 

collusion-resistant Nash equilibrium to model protocols that are resilient to both crash failures and strate- 
gic manipulations of a group of colluding agents. For a system with n distributed agents, we design a 
deterministic protocol that tolerates 2 colluding agents and a randomized protocol that tolerates n — 1 
I— —I colluding agents, and both tolerate any number of failures. We also show that if colluders are allowed 

Q^ an extra communication round after each synchronous round, there is no protocol that can tolerate even 

^^ 2 colluding agents and 1 crash failure. 

I — I 1 Introduction 

m 

J> Consensus is a distributed task at the core of many distributed computing problems. In consensus, each 

'nJ" process proposes a value and eventually all processes need to agree on an irrevocable decision chosen from 

i^ the set of proposed values. Extensive studies have been conducted on consensus protocols tolerating various 

■^ kinds of system failures from crash failures to malicious Byzantine failures (cf. llT6l l5l). 

^ Besides unexpected system failures, users of distributed systems may alter their protocol components to 

O achieve certain selfish goals, which we refer as strategic manipulations of distributed protocols. The issue 

^ is more evident in systems spanning multiple administrative domains, such as peer-to-peer systems, mo- 

• • bile computing systems, and federated cloud computing systems, where each computing entity has selfish 

. !^ incentives. Combining unexpected Byzantine failures with strategic manipulations in distributed protocol 

/\ design have been studied in the context of secret sharing and multiparty computation |[il|3l|T7l, fault-tolerant 

replication IHITl, and gossip protocols |T5l. However, many important topics on incorporating selfish in- 
centives with fault-tolerant distributed tasks left unexplored. In particular, we are unaware of any work 
on incorporating selfish incentives with crash failures for distributed consensus. Consensus protocols tol- 
erating crash failures have been widely used in distributed systems, and thus it is natural to ask how to 
further tolerate strategic manipulations on top of crash failures. Moreover, Byzantine consensus protocols 
cannot simply replace crash-resilient consensus protocols, because they only tolerate less number of fail- 
ures, are more complex, and often require costly cryptographic schemes. Byzantine consensus protocols 
cannot resist strategic manipulations either, because they only guarantees consensus and are not immune 
to manipulations that improve agents' utilities while satisfying consensus requirement. Therefore, studying 
consensus protocols resilient to both crash failures and strategic manipulations are an important research 
topic of independent interest. 



In this paper, we make the first attempt to tackle the problem of distributed consensus resilient to both 
crash failures and strategic manipulations. In particular, we study synchronous round-based consensus sub- 
ject to both crash failures and manipulations by strategic agents (to differentiate from the mechanical pro- 
cesses) who have preference on consensus decisions. As long as consensus is reached, an agent could 
manipulate his algorithm in arbitrary ways, such as faking the receipt of a message or pretending a crash 
failure, in order to reach a preferred decision value for him. This models scenarios in which agents may 
want to gain access in mutual exclusion protocols or become the leader in leader election protocols, which 
are often implemented with a consensus component. 

Standard consensus protocols are easily manipulated, as shown by the following motivating example. 
In a standard n-process synchronous consensus protocol tolerating n — 1 crash failures fTF], processes 
exchange all proposed values they received so far f or n — 1 rounds and at the end of round n — 1 decide 
on the smallest value they received. Now consider a simple system of three agents {1,2,3}, and agent i 
prefers value Vi = i over other values and thus uses Vi as his consensus proposal. Agent 2 can manipulate 
the above protocol in the following way (Figure [TJa)): if in round 1 agent 2 receives the proposal vi from 

1, he does not include vi in his message to agent 3; then if in round 2 agent 2 does not receive a message 
from agent 1 and does not see vi in the message he receives from agent 3, he knows that agent 1 has crashed 
before sending his proposal to agent 3. In this case agent 2 will decide his own proposal V2, which is a 
better choice for him than vi , the value he would decide if he followed the protocol. For agent 3, he only 
sees agent 2's proposal V2 and his own proposal v^, and since V2 < v^, agent 3 will follow the protocol and 
decide V2, reaching consensus. In all other cases, agent 2 would follow the protocol, and it is easy to check 
that consensus is always reached. Therefore, this standard synchronous consensus protocol is not resilient 
to strategic manipulations by a single selfish agent. 

Strategic manipulations introduce uncertainty and instability to the system leading to unexpected system 
outcome, and thus should be prevented if possible in general. In particular, consensus may be violated if 
more than one agents try to manipulate the protocol at the same time. In the above example, agent 3 may 
also conduct a manipulation symmetric to that of agent 2, if he prefers V2 over vi. Consider a run in which 
all agents are correct in round 1 and in round 2 agent 1 crashes after sending a message to agent 2 but before 
sending a message to agent 3 (Figure [TJb)). Agents 2 and 3 independently want to manipulate the protocol, 
and thus in round 2 they do not send vi to each other. At the end of round 2, according to their manipulation 
rule, agent 2 would decide vi but agent 3 would decide V2, violating consensus. Therefore, resiliency to 
strategic manipulation is desirable to avoid such scenarios. 

Designing a consensus protocol resilient to both crash failures and strategic manipulations is far from 
trivial, since one needs to cover all possible manipulation actions and their combinations. Going back to the 
above motivating example, for the described cheating action of agent 2, agent 3 may detect inconsistency 
if he receives a message from agent 1 in round 2 but did not see vi from agent 2's message in round 2 
(Figure [TJc)). In this case, agent 3 could execute a punishment strategy to hinder agent 2 from taking the 
cheating action. Even if certain cheating actions can be detected, one has to carefully go through all possible 
cases and detect all of them. More seriously, not all cheating actions can be detected. For example, for agent 

2, instead of the above cheating action, he may pretend a crash in round 2, not sending any message to agents 
1 and 3 (Figure [TJd)). It is easy to see that agent 2 can still benefit from this cheating action but the action 
cannot be detected by others. Therefore, tolerating manipulations together with crash failures is a delicate 
task. To make things more complicated, we further target at tolerating collusions of multiple agents. 

In this paper, we adapt collusion-resistant Nash equilibrium to model consensus protocols that resist 
both crash failures and strategic manipulations. Roughly speaking, we say that a group of coUuders can 
manipulate a consensus protocol if they can change their protocol execution such that the deviation still 
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Figure 1 : Motivating example on crash failures combined with strategic manipulations in a consensus pro- 
tocol. Some unimportant messages are omitted for clarity, (a) A case of successful manipulation: agent 2 
manipulates his message to agent 3 in round 2 by not including vi in the message, (b) Consensus violation 
due to independent manipulations: agents 2 and 3 both manipulate their round-2 messages to each other by 
not including vi. (c) Detecting inconsistency after manipulation: agent 2 manipulates his round-2 message 
to agent 3 as in case (a), but agent 3 detects this manipulation because it receives a message from agent 1 in 
round 2. (d) Successful manipulation without being detected: agent 2 pretends a crash in round 2 to avoid 
sending a message to agent 3. 

guarantees consensus in all possible crash scenarios, and in one crash scenario one of the coUuders benefits 
(i.e. obtains a better consensus decision). We say that a consensus protocol is (c, f)-resilient if it solves 
consensus and no group of coUuders of size at most c can manipulate the protocol, in a synchronous system 
with at most / crash failures. Our contributions in this paper include: 

• We propose the solution concept of (c, /) -resilient consensus protocol to incorporate selfish behaviors 
into crash-prone systems in distributed protocol design, and connecting it with the theory of social 
choices. 

• We provide a deterministic (2, /)-resilient consensus protocol and a randomized (n — 1, /)-resilient 
consensus protocol for any / < n — 1. Both protocols are polynomially bounded in round complexity, 
message complexity, and local computation steps, and neither of them relies on any cryptography or 
computational hardness assumptions. 

• Moreover, we show that if coUuders have an extra round of communication after each synchronous 
round ends, then no (2, /)-resilient consensus protocol (even randomized) exists for any / > 1. Thus 
extra communication among coUuders are very powerful for strategic manipulations. 

Our study demonstrates both the feasibility and difficulty in incorporating selfish incentives with crash 
failures, and provides several techniques that could be used to defense against strategic manipulations in 
other similar situations. We hope that our study could lead to more work on incentive-compatible and 
fault-tolerant distributed protocols. 



1.1 Related work 

Both fault-tolerant distributed computing and game theory address distributed entities that may experience 
abnormal behaviors, whether being system failures or strategic manipulations. The combination of the two. 



however, is not yet widely explored. Several existing studies address the combination of Byzantine failures 
with strategic manipulations H \T5\ UWH^ \1J] in distributed protocol design. 

The BAR (standing for Byzantine, Altruistic, and Rational) fault tolerance framework, proposed in a 
series of work [4, 15,7], incorporates rational behavior into Byzantine fault-tolerant distributed protocols for 
state machine replication, gossip, terminal reliable broadcast, and backup services built on these protocols. 
They also consider agent utilities that prefer certain proposed values, but their solution concept is weaker 
than ours in two aspects. First, a rational agent would only deviate from the given protocol when he could 
benefit no matter who are the Byzantine agents and what the Byzantine agents would behave, while in our 
model a rational agent could deviate as long as it guarantees consensus and receives benefit in one possible 
crash scenarios. Second, they do not consider collusions among rational players. As a result, their protocol 
does not apply to our setting. 

In ||Tl|3l, Abraham et al. study rational secret sharing and multiparty computation that could tolerate both 
Byzantine failures and a colluding group of rational agents. They provide matching upper and lower bounds 
on implementing a trusted mediator resilient to k coUuders and t Byzantine agents with distributed protocols 
also resilient to k colluders and t Byzantine agents. Their solution concept of {k, f )-robust equilibrium is 
stronger than ours in that (a) a rational agent could deviate as long as he benefits in one possible Byzantine 
failure scenario, even if the deviation may lead to drastic damages (e.g. violation of consensus) in other 
situations; and (b) Byzantine failures can only improve the utility of non-Byzantine agents. Even though 
their solution concept is stronger, their positive results do not apply to our consensus setting, because they 
need the condition that a (fc, t)-robust equilibrium with a trusted mediator exists, but in our consensus setting 
such equilibria do not exist with any t > I (See discussion in the Section [Z4] ). 

In ITTtII . Lysyanskaya and Triandopoulos also consider both rational and Byzantine behaviors in the 
context of multiparty computation. They study a class of rational behaviors in which a rational agent gets 
increased utility if his computation gets close to the target value or other agents' computation gets further 
away from the target value. In our case, an agent would not get an increased utility simply because other 
agents are getting worse. 

There are a few other studies that address both Byzantine failures and strategic manipulations. In fTE\, 
Moscibroda et al. study a virus inoculation game in which some players are Byzantine, and they define 
price of malice as a way to measure the impact of Byzantine players to system efficiency. In 0, Blum et 
al. propose the use of regret minimization instead of Nash equilibrium in the study of price of anarchy, and 
define the price of total anarchy, which is robust against Byzantine players who do not play best responses. 
These studies, however, focus on introducing Byzantine behavior to the strategic game format, while our 
work together with ||4l [U focus on introducing rational behavior into fault-tolerant distributed computing 
protocol design. 

A number of other works address strategic behavior in distributed protocol design, but they do not further 
consider unexpected system failures. Distributed algorithmic mechanism design (DAMD) is a framework 
proposed in lH [T0| and later refined in ll20l l22l |8l, which incorporates rational behaviors in the design 
of distributed protocols such as multicast cost sharing and internet routing. Several studies address secret 
sharing and multiparty computation among rational agents ||T3l [141 [T2 l . A concurrent work by Abraham 
et al. [2] studies leader election with rational agents. They consider both synchronous and asynchronous 
networks with fully connected or ring network topologies, where agents prefer to be elected as the leader. 
Since they do not consider failures, their focus is on developing strategy-proof random selections so that 
every agent has a fair chance to be the leader. 



Paper organization. In Section |2j we describe our model for consensus protocols that resist both crash 
failures and strategic manipulations, and also connect our model with a classic result on social choice func- 
tions. Section [3] contains our main algorithm results on (c, /)-resilient consensus protocols. We start by 



presenting a deterministic (2, /)-resilient protocol for / < n — 2 in Section 3.1 and then extend it to a 



deterministic (2, /)-resilient protocol for / < n — 1 in Section 3.2 and a randomized (n — 1, /)-resilient 



protocol for / < n — 1 in Section 3.3 We discuss protocol complexity and summarize techniques used in 



our protocols for resisting strategic manipulations in Section 3.4 Section|4]shows the impossibility result on 



tolerating colluders with extra communication rounds. We conclude the paper and discuss future directions 
in Section m 

2 Modeling consensus resilient to strategic manipulations and crash failures 

2.1 System model 

We consider a distributed system with n agents 11 = [n] = {1, 2, . . . , n}. Agents proceed in synchronous 
rounds, with round 1, 2, and so on. In each round, agents start by sending messages to a selected set of 
other agents. After sending out messages, agents will receive messages sent to them in the same round, 
and then based on received messages update their local states. Agents will not receive messages sent in 
earlier rounds. The channels are reliable, that is, if a message is sent by agent i to agent j in round r, 
and neither of them crash in this round, then agent j will receive this message from agent i at the end of 
round r. Formally, the synchronous round system provides two interface functions send{) and recv{). The 
invocations of send{) and recv{) have to be well-formed, more specifically, the invocation sequence on an 
agent has to start with a send{), and then alternates between recvQ and send{). The r-th pair of send{) 
and recv{) is for synchronous round r. Let M. be the set of all possible messages sent in the system. Let 
Msgs be an array type [n] ^ M.VJ {-L}. Interface send{) takes one parameter smsgs G Msgs, such that 
if smsgs[j] G J\4, smsgs[j] is the message the calling agent wants to send to agent j in this round, and if 
smsgs[j] = _L, it means the calling agent does not want to send any message to j in this round. Interface 
recv{) returns a result rmsgs G Msgs, such that if rmsgs[j] G Ai, rmsgs[j] is the message the caller receives 
from agent j in this round, and if rmsgs[j] = _L, it means that the caller does not receive any message from 
j in this round. 

The round model is fixed by the system and cannot be manipulated by the agents. For example, an agent 
cannot wait to receive messages of round r and then send out his own message of round r that may depend 
on the messages received. More precisely, an agent can only manipulate smsgs in his send (smsgs), and not 
others, including the alternating sequence of send()'s and recv{)'s. 

Agents may fail by crashing, and when an agent i fails in a round r, it may fail to send its round-r 
messages to a subset of agents, and it will stop executing any actions in round r + 1 or higher. Formally, 
agent failures are characterized by a failure pattern F, which is defined as a subset of { (i, j, r) \i,j G 11, r = 
1, 2,3,.. .}, and satisfies the constraint that if {i, j, r) F, then for all r' > r, for all / G 11, (i, j', r') F. 
Thus (i, j, r) ^ F means that i's message to j in round r would be successful should i send a message to j 
in r. For some agent i, if for all j G 11 and all r > 1, {i, j, r) G F, then we say that agent i is non-faulty (or 
correct); otherwise, we say that agent i is faulty. We say that i crashes in round r if r is the smallest round 
such that there is some j with (i, j, r) F, and i is alive in round r if for all j G 11, (i, j, r) G FJH In the 

'We consider the case that i crashes after successfully sending out all round r messages but before receiving round r messages 
or updating local states as the same as i crashes at the beginning of round r + 1 before sending out any messages of round r + 1, 
since no other agent can distinguish these two cases. 



paper, we use / to represent the number of possible faulty agents in an execution. 

Let y be a finite set of possible consensus proposal values. The private type (or type) 6i of agent i is his 
preference on the set of proposals V. That is, it is a total order -< j on V, and u ^i v means i prefers v over 
u. Let = {9i,92, ■ ■ ■ ,On)be the vector of private types. 

The message history of agent i at the end of round r is an array mhisti[l..r], where mhistj[A;] G Msgs 
denotes the messages i receives in round A;, for A; = 1, 2, . . . , r. The local state of agent i at the end of round 
r, i.e., before i invokes its (r + l)-th send{), includes (a) the current round number r, (b) the message 
history mhistj[l..r] of i, and (c) his private type di. As a convention, when r = 0, the message history 
mhistj[l--?'] is empty, which represents the initial state at the beginning of the first round. 

A deterministic algorithm Ai of agent z is a function from his local state at the end of each round to the 
messages it is going to send in the next round, i.e., Ai{r, mhistj[l..r], 6i) is the parameter in agent z's (r + 1)- 
th invocation of send{), for r = 0, 1, 2, . . .. Let A denote the collection of algorithms Ai, A2, ■ ■ ■ , An, 
which we also refer as a protocol. Given a failure pattern F, a private type vector 6, and a deterministic 
protocol A, the full execution of the system is determined, and we call the execution a run. A run describes 
exactly on every round what are the messages sent and received by every agent. Let R{F, 0, A) denote this 



run. We will introduce randomized protocols later in Section 3.3 



2.2 Rational consensus with strategic manipulations 

For the consensus task, each agent i has an output variable dj with initial value _L y. In every run, di 
is changed at most once by i to a new value in F U {T}, which is called the consensus decision of i. The 
special symbol T 1/ is not among any proposal values and is used as a punishment strategy by agents, 
which will be clear when we introduce our consensus protocol in Section [5] We use d{F, 9, A) to denote 
the decision vector when the run R{F, 6, A) completes and di{F, 6, A) to denote the decision of i in the 
run. Note that some decisions may still be _L, either because the agent has crashed, or the agent chooses not 
to decide. To incorporate decision output to the model, we slightly modify the definition of algorithm Ai of 
agent i, such that the function value Ai{r, mW\5ti[l..r], 6i) also includes the decision value d, which could 
be _L meaning that i does not decide at the end of round r, or an actual value meaning that i decides on this 
value at the end of round r. 

To align with the standard terminology in game theory, we define strategies of agent i as follows. Note 
that algorithm A-i of i is a function Ai : (r, mhistj[l..r], 0j) 1— )• (smsgs, d), we define a strategy Si of 
agent i to be essentially the same as Ai, but re-arrange it to be a function from the private type 9i, that is 
Si : 9i ^^ ((r, mhistj[l ••?']) i-> (smsgs,d)), such that Sj(0j)(r, mhistj[l..r]) = Ai{r,m\\\sti[l..r].,9i). Let 
s = (si, S2, • . . , Sn) denote a strategy profile, and let s{6) denote {si{9i), 82(92), . . . , s„,(^n))- 

We focus on distributed consensus in this paper, which requires all agents eventually decide on the same 
value. We specify the consensus task in terms of the following legal strategy profile. 

Definition 1 (legal strategy profile w.r.t consensus, a.k.a. consensus protocol) A strategy profile s is le- 
gal with respect to consensus if for any failure pattern F and any private type vector 0, the resulting run 
R(F, 0, s) always satisfies the following properties: 

• Termination: every correct agent eventually decides in the run. 

• Uniform Agreement: no two agents (correct or not) decide differently. 

• Validity: if some agent i decides v ^ V, then v must be the most preferred value of some agent j 
(according to 9j). 



Note that a strategy profile is essentially a protocol (collection of agents' algorithms), and thus we will 
use the terms strategy profile and protocol interchangeably. In this paper, we consider the utility Ui of agent 
i to be only dependent on the decision vector of a run, and is consistent with i's preference specified in his 
private type 9i. Formally, given a run R{F, 0, s) under failure pattern F, private type vector 6 and strategy 
profile s, we define the utility of i in this run, Ui{R{F, 0, s), 9i), to be: (a) if i crashes according to F in 
the run, then Ui{R{F, 6, s), 9i) = 0; (b) if the run satisfies consensus properties (Termination, Uniform 
Agreement and Validity) with the decision value d, then Ui{R{F,0,s),9i) is a positive value such that 
the more i prefers d, the higher the utility value; and (c) if the run violates at least one of the consensus 
properties, Ui{R{F, 0, s), 9i) = — oo. This utility function indicates that agent i is neutral if he crashes in 
the run, he has preference on the decision value when consensus is satisfied, and it would be a disaster for 
him if consensus is violated. 

In this paper, we allow a group of colluding agents to manipulate the protocol together in order to 
benefit one of the coUuders, which we formalize below. Let C C 11 be the set of colluding agents. We allow 
coUuders to know the private types of all coUuders in advance. To model this, we say that a colluder Ts 
strategy si is a function Si : 9c ^^ {{r, mhisti[l--?^]) i— >• (smsgs, d)), where 9c is a sub-vector containing 
all entries for all j e C. We use s'q to denote coUuders ' strategy profile, which contains entries for i G C 
where each s'^ is a function from 9c as defined above. We denote {s^c^ ^'c) ^^ ^ ^^^^ vector obtained from 
strategic profile s by replacing all entries of s for i G C with the corresponding entries in s'(^. 

Definition 2 (group strategic manipulation) For a legal strategy profile s = (si, S2, . . • , s„), a group of 
agents C can strategically manipulate profile s if there exists a coUuders' strategy profile s'^for agents of 
C, such that (a) {s^Ci ^'c) ^^ ■^^'^^ '^ legal strategy profile, and (b) there exists a failure pattern F and a 
private type vector 9 in which all agents in C have the same most preferred proposal, such that the utility of 
some i e C in {s^c,^c) is better off: Ui{R{F,0, (s_c, s'(j)),9i) > Ui{R{F,6,s),9i). 

We also refer to colluding agents as cheaters, and agents who follow the protocol as honest agents. We 
are now ready to introduce our central solution concept. 

Definition 3 ((c, /)-resilient equilibrium, or (c, /)-resiIient consensus protocol) A (c, /)-resilient equi- 
librium, or (c, /)-resilient consensus protocol is a legal strategy profile sfor consensus, such that no group 
of agents of size at most c can strategically manipulate profile s, in a system with at most f crash failures. 

Note that when c = 0, (c, /)-resilient equilibria are simply classic synchronous consensus protocols 
tolerating / crash failures. When c = 1, if we remove the legal strategy profile requirement (condition (a) 
in Definition|2]l, our solution concept would match the ex-post Nash equilibrium concept. Our legal strategy 
profile requirement makes our solution concept unique, and we will justify its inclusion shortly. Before 
providing justifications to our solution concept, we will first make a connection of our solution concept with 
the theory of social choice functions and state an important result, which will be used in our justifications. 

2.3 Dictatorship in ex-post Nash equiUbrium 

Given a legal strategy profile s and a failure pattern F, we say that an agent i is a dictator of s under F 
if for any private type vector 6, the decision in run R{F, 6, s) is always the most preferred value of i. We 
now connect our solution concept with social choice functions and establish the result that for any c > 1 
and / > 0, any (c, /)-resilient equilibrium under any failure pattern must have a dictator. A social choice 
function f (in our context) is a function from a private type vector to a value in V. A social choice 



function / is incentive compatible if there does not exist an agent i, a private type vector 9, two proposal 
values a, 6 G y, such that (a) f{6) = b, (b) i prefers a over b in 9i, and (c) i could find another private type 
9'^ so that f{{6^i, 9[)) = a (c.f Chapter 9 of 1 19|). We say that a social choice function / is a dictatorship 
if there exists an agent i such that for all private type vector 6, f{6) is always the most preferred value 
in 9i. The following is the famous Gibbard-Satterthwaite Theorem on incentive compatible social choice 
functions, which is also a version of Arrow's Impossibility Theorem on social welfare functions. 

Proposition 1 (Gibbard-Satterthwaite Theorem lllTll2ll ) If f is an incentive compatible social choice 
function onto V and \V\ > 3, then f is a dictatorship. 

Theorem 1 For any c > 1, any failure pattern F with at most f crash failures, if s is (c, f)-resilient 
equilibrium and \V\ > 3, then there always exists a dictator of s under F. 

Proof. Consider any (c, /)-resilient equilibrium s and a failure pattern F with at most / failures. The 
strategy profile s under F can be viewed as a social choice function fg^F from private type vector to the 
decision value in the run R{F, 0, s). Since s is an (c, /)-resilient equilibrium with c > 1, we know that its 
corresponding social choice function fg.F is incentive compatible. In fact, if it is not the case, then we can 
find an agent i, a private type vector 6, a private type 6[ of i, such that fs,F{Q) = b, fs^F{{f^-i, &[)) = «> and 
i prefers a over b in 9i. If so, agent i can choose an alternative strategy s'^ such that s[{9i) = Si{9'j) and for 
all other 9'1 ^ 9i, s[{9'l) = Si{9"). Essentially when i's type is 6i, he just pretends his type is 9'^. Since s is 
a legal strategy profile, (s_j, s'J is also a legal strategy profile, because i only changes his type but nothing 
else. However, agent i could choose s'^ so that he will benefit under private type vector 6 and failure pattern 
F, contradicting to the fact that s is a (c, /)-resilient equilibrium. 

Moreover, since all runs R{F, 6, s) satisfy Validity of consensus, we know that any proposal in V could 
be a possible decision value. That is fg^F is onto V. Thus by the Gibbard-Satterthwaite Theorem, fg^F must 
be a dictatorship, i.e., there always exists a dictator of s under F. D 

Henceforth, we assume \V\ > 3, which means from the above theorem that any (c, /)-resilient consen- 
sus protocol with c > 1 must be a dictatorship for any failure pattern F. 

2.4 Explanation and justification of the solution concept 

We are now ready to provide some explanations and justifications to our solution concept. 

Remark 1 (On possible cheating behaviors). We allow an agent to cheat not only by faking a different 
private type, but also by modifying his entire algorithm, such as sending messages he is not supposed to 
send, pretending the receipt of messages he does not actually receive, or pretending to have a crash failure, 
etc. An agent has all the freedom to change his algorithm (i.e. strategy) as long as he ensures that the 
changed algorithm together with other agents' algorithm still guarantees consensus, for all possible failure 
patterns and private type vectors. 

Remark 2 (On legal strategy profile requirement). The requirement that the strategy profile after ma- 
nipulation is still legal (condition (a) in Definition [2]) distinguishes our solution concept from the existing 
treatments combining Byzantine failures with strategy manipulations (El|7l[Tl|3l[T7l- In our setting, coUud- 
ers have to ensure that in all possible failure patterns consensus is reached after the deviation, which means 
that rational agents are risk-averse in terms of reaching consensus. However, conditioned on that consen- 
sus is always ensured, rational agents would deviate from the protocol as long as there exists some failure 



pattern and some type vectors of agents in which some of them benefits, which means they are risk-taking 
under the condition that consensus is guaranteed. 

Our solution concept captures a natural situation where the game outcomes could be normal or disas- 
trous, and agents tend to be risk-averse in avoiding disasters but risk-taking when they are sure the disaster 
would not happen by their manipulations. In the case of consensus, violation of consensus could be disas- 
trous, for example, it may lead to inconsistent copies in state machine replications, or concurrent access of 
critical resources in mutual exclusion protocols, which may bring down the entire system. Therefore, agents 
are only willing to manipulate the protocol when they are sure consensus would not be violated. 

Our solution concept can be further explained if we consider that agents may have partial knowledge 
about the system, e.g., agents may know the probability distribution of failure patterns and type vectors of 
agents. In this case, if there is a non-zero probability that consensus is violated, it will give the agent a — oo 
payoff since the utility of the disastrous outcome of violating consensus is — oo , and thus agents will always 
avoid consensus violation. On the other hand, if with probability one consensus is satisfied, then as long 
as the gain of the agents under some failure patterns and type vectors outrun the loss under other cases, the 
agents would deviate. By using our solution concept, we model the above situation without the complication 
of modeling probabilistic events in the system. 

Moreover, in the consensus setting our requirement on legal strategy profile is also necessary. This is 
easy to see when c + / = n: If we do not require that the deviated strategy profile to be legal, then the c 
coUuders can simply decide on their most preferred value without any communication, hoping that the rest 
n — c = f agents are all crashed, which means no (c, /)-resilient consensus protocol exists. The following 
proposition further shows that this is the case for all c, / > 1. 

Proposition 2 If we remove the legal strategy profile requirement specified as condition (a) in Definition^ 
then no (c, f)-resilient consensus protocol exists for any c, / > 1. 

Proof. For a contradiction, let s be such a (c, /)-resilient consensus protocol. Note that s is also an (c, /)- 
resilient consensus protocol with the legal strategy profile requirement, and thus Theorem [T] still applies to 
s. Let Tl{F) be the set of runs R{F, 6, s) under failure pattern F, with all possible type vectors 0. By the 
Termination property of consensus, in all these runs all non-faulty agents decide. Let rp be the largest round 
number at which some agent decide, among all runs in Tl{F). Since V is finite, the number of possible type 
vectors is also finite, and thus set Tl{F) is finite and rp is a finite number. 

Consider the failure-free pattern Fq. By Theorem [TJ there is a dictator d of s under failure pattern Fq. 
Let Fi be a failure pattern in which d crashes at the beginning of round rp^ + 1, and all other agents are 
correct. Since all agents have decided by the end of round rp^^ in all runs with failure pattern Fq, crashing d 
at the beginning of round rp^ + 1 does not change any decision value. Thus d is also the dictator of s under 
Fq. Let F* be another failure pattern in which d crashes at the beginning without sending any messages. 
Let d* be the dictator of s under F*. It is clear that d* is different from d, since in F* d has no chance to 
send any messages, no other agents would know the most preferred value of d. 

Now consider a graph with all possible failure patterns as vertices, and two vertices F and F' have an 
edge if one has one more entry {i,j, r) than the other. On this graph, we can find a finite path from Fi to 
F*, since we can remove the entry {d, j, r) one by one from Fi, starting from round rp^y Since the two ends 
of this path has different dictators d and d* , respectively, along the path from Fi to F* , we can find the first 
edge from Fa to F^ such that the dictator changes to be another dictator d" different from d. Let (d, j, r) be 
the additional entry that Fa has comparing to F;,. 

We argue that agent j can manipulate protocol s, if we do not have the legal strategy profile requirement 
on the manipulation. Agent j's manipulation is as follows. At the end of round j, when j receives a message 



from d, he simply pretends that he does not receive any message from d in round r and acts accordingly in 
the later rounds. Agent j will gain the benefit if the run has failure pattern Fa and he prefers the proposal 
of d" over that of d, since with his deviation all agents would behave as if they are in failure pattern Fb and 
decide on the proposal of d". Therefore, s is not a (c, /)-resilient consensus protocol for any c, / > 1. □ 
With the legal strategy profile, however, the manipulation of j stated in the above proof would be easily 
handled with, because it is possible that d does not crash in round r and continues sending messages in 
round r + 1, so that other agents would immediately detect that j has manipulated the protocol, and execute 
certain punishment strategy on j. Therefore, the legal strategy profile requirement is both reasonable and 
necessary in our setting. 

Remark 3 (on all coUuders having the same most preferred proposal). In Definition |2] we require that 
all coUuders in C have the same most preferred proposal. Without this requirement, no protocol can escape 
a trivial cheating scenario in which a colluding dictator simply uses his colluding pattern's most preferred 
value instead of his own, as detailed by the proposition below. Moreover, it is also reasonable to assume that 
coUuders having the same most preferred value, since they usually share some common goal, which is what 
brings them together in the first place. 

Proposition 3 If we do not require that all coUuders have the same most preferred value, then there is no 
(c, f)-resilient consensus protocol for any c > 2. Moreover, even if we require that no colluder is worse off 
in condition (b) of Definition^ there is no (c, f)-resilient consensus protocol for any c > 2 and / > 1. 

Proof. Suppose, for a contradiction, that there is a (c, /) -resilient consensus protocol s, when we do not 
require that all coUuders have the same most preferred value. Consider the failure-free failure pattern F. 
Note that Theorem [T] only concerns non-colluding deviation, so it still applies in this case to s. Let d be the 
dictator of s under failure pattern F. Let i y^ dhe another agent colluding with d, and d and i have different 
most preferred values Vd and Vi, respectively. Then d could simply pretend that his most preferred value is 
Vi. Since d is the dictator, the consensus decision would be Vi, which is a strictly better result for i. Since 
only the type of d is changed, the resulting strategy profile must also be legal. Therefore, we can a case of 
{d, i} colludes and they manipulate the protocol so that i benefits from the manipulation. This contradicts 
to the assumption that s is (c, /)-resilient. 

In the above manipulation, d is worse-off after the manipulation. However, if / > 1, we can let d crashes 
at the end after consensus decision is made, which means in the new failure pattern d is still the dictator. 
In this failure pattern, d's utility remain the same since he crashes, and i's utility is better-off, so we have a 
case that no colluder is worse-off if we allow at least one crash failure. D 

Remark 4 (on one colluder benefiting from the deviation). In Definition |2] we allow some of the col- 
luders to be worse off as long as one colluder is better off. In fact, in our setting it is equivalent to requiring 
all non-faulty coUuders are better off (the faulty agents always have utility by definition, so they are not 
worse off), as shown by the following proposition. 

Proposition 4 Assume that we change the condition (b) of Theorem [7] such that we require all non-faulty 
coUuders in C have to be better off after the manipulation. Let s is a (c, f)-resilient consensus protocol 
under this new definition for any c > 1. Then s is also a (c, f)-resilient consensus protocol under the old 
definition. 
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Proof. Note that Theorem [T] only concerns the deviation of a single agent, thus it still applies to s under 
the new definition. Suppose, for a contradiction, that there exists a set C of colluders, such that they can 
manipulate s to be another legal strategy profile s' = {s'^,, s_c) under the old definition. Then there must 
exist a failure pattern F, a type vector 6, and an agent i £ C, such that i decides vi in the run of s with F and 
0, while i decides V2 in the run of s' with F and 6, and i prefers V2 over vi in Oi. Since s is a (c, /)-resilient 
equilibrium under the new definition, by Theorem [T] there exists a dictator d of s under failure pattern F. 

If d £ C, then in the run of s under failure pattern F and type 6, agents decide on the most preferred 
value of d. Since all colluders have the same most preferred value and d is a colluder, no colluder would 
benefit from any deviation. Thus we have d ^ C. 

Let O'q be a type vector for agents in C such that all agents in C have type di. Let 6' = {0'^, 0-c)- Let 
s" be a strategy profile same as s except that when the colluders in C have type 0'^, they pretend that they 
have type 6c and then use strategy s'^j. The strategy profile s" is still legal, because both s and s' are legal 
and colluders either use s or s' in all cases. 

We now argue that all non- faulty colluders are better off with failure pattern F and type vector 0' . First, 
consider the run in which colluders do not cheat. The consensus decision in the run of s with F and 6' is 
dictator d's most preferred value, and since d is not a colluder, it is the same as the consensus decision in 
the run of s with F and 0, which is vi. Second, consider the run in which colluders manipulate s in the 
above described manner. The consensus decision in the run of s" with F and 6' is the same as the run of 
s' with F and 6, according to the manipulation rule. Thus this decision is V2. By our assumption above, 
in 6' all colluders prefer V2 over vi , and thus as long as they do not crash, they will be better off with the 
manipulation. This contradicts to the condition that s is a (c, /)-resilient consensus protocol under the new 
definition. D 

Remark 5 (on the non-applicability of the positive results of |T,'3l). The solution concept (fc, t)-robust 
equilibrium in |[T]|3l is stronger than ours, yet their positive results do not apply to our case. The reason is 
that their i-immune definition is stronger than our legal strategy profile definition in terms of fault tolerance. 
In their definition, they require that no Byzantine group of size at most t could decrease the utility of any non- 
Byzantine agents in any case. In contrast, our definition follows the standard consensus definition, and thus 
we only require that consensus is reached in spite of crash failures, but it is possible that crash failures change 
the consensus decision and decrease some agent's utility. This difference in the fault tolerance requirement 
leads to a significant difference in the case with a trusted mediator. Our (c, /) -resilient consensus protocol 
can be trivially realized with a trusted mediator: the mediator simply receives all proposals in the first round, 
and selects the value from the agent with the smallest identifier and broadcast this value as the decision value. 
However, no {k, t)-robust equilibrium exists even with a trusted mediator. The reason is that, if one exists, 
there must be a dictator as our Theorem [T] also applies to {k, t)-robust equilibrium. Then if the dictator is a 
Byzantine agent, he could always select some value as his most preferred value in order to decrease some 
other agent's utility, which means the protocol is not t-immune even with the trusted mediator. Due to this 
reason, the positive results in |[T] |3l cannot be applied to our solution concept. This is the reason why if we 
remove the legal strategy profile requirement (the solution concept is still weaker than the solution concept 
in HlDl), we cannot even have a (1, l)-resilient consensus protocol (Proposition [III, while in flJIH, a {k, t)- 
robust equilibrium could exists without additional assumptions, as long as n > 3(A; + t) and a (fc, t)-robust 
equilibrium with a trusted mediator exists. 

Remark 6 (on (c, /)-resiliency vs. (c, /') -resiliency for /' < /). One subtlety of our solution concept 
is that (c, /)-resiliency does not directly imply (c, /') -resiliency for /' < /. The reason is due to the risk- 
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averse legal strategy profile requirement: When the number of possible failures decreases, risk-averse agents 
need to worry less number of failure patterns in guaranteeing consensus, and thus having more chances to 
manipulate the protocol. This situation exists in general if the solution concept contains a risk-aversion 
aspect: when the possible scenario in the environment gets smaller (/ decreases in our case), risk-averse 
agents have more chance to cheat, so a previous equilibrium may no longer be an equilibrium. 

3 Collusion-resistant consensus protocols 

In this section, we first describe a deterministic (2, /)-resilient consensus protocol for any / < n — 2, and 
then adapt the protocol to a deterministic (2, /)-resilient protocol and a randomized (n — 1, /)-resilient 
protocol for any f < n — 1. 

3.1 Alg-NewEpoch: Deterministic (2, /)-resilieiit consensus protocol for f < n — 2 

The deterministic consensus protocol, named Alg-NewEpoch, consists of three components. In the first 
component Alg-MsgGraph, agents exchange and update status of every message occurred so far. In 
the second component Alg-Dictator, agents use the message status collected in Alg-MsgGraph to 
determine the current dictator of the system, in order to decide on the dictator's most preferred value. In the 
third component Alg-Consistency, agents perform consistency check and execute a punishment strategy 
when detecting any inconsistency. We index every message m in a run as {i,j, r), which means that m is 
sent by agent i to agent j in round r. For convenience, we use sender{m), receiver{m), and round{m) to 
denote the sender, receiver, and the round of m, respectively. 

3.1.1 Component Alg-MsgGraph 

We first describe component Alg-MsgGraph. For each message m indexed by {i,j,r), each agent p 
records m in one of the four status, sent, not-sent, never-known and uncertain, with the following intuitive 
meaning: (a) sent: agent p knows (from all messages he has received) that the message m was sent by i 
successfully; (b) not-sent: agent p knows (from all messages he has received) that the message m was not 
sent successfully by i because i has crashed; (c) never-known: p would never know whether message m is 
sent successfully by i or not, no matter what happens later in the run; (d) uncertain: p does not know yet 
whether message m is sent successfully or not, but p may know about it later in the run. 

Formally, each agent p maintains a variable MsgGraph, the value of which is a function from all (i, j, r) 
tuples to {sent, not-sent, never-known, uncertain}. The value of MsgGraph of agent p at the end of round 
r is denoted as IVlsgGraphp^, for r > (IVlsgGraphpQ means the initial value of IVlsgGraph at p). In 
IVlsgGraphp^, all messages of round r + 1 or above have the default status uncertain.For a message m, we 
use l\/lsgGraphp^(?7i) to denote the status of m recorded in p's variable IVlsgGraph at the end of round r, 
and when context is clear we may simply represent it as l\/IsgGraph(m) or l\/lsgGraphp(?n,). The following 
definition is used by the algorithm to when labeling a message as never-known. 

Definition 4 A sequence of messages {mo, rrii, . . . , rrik) is called the message chain ofm in some variable 
MsgGraph at the end of round r if all of the following are satisfied: (a) m = mo; (b) sender (mi) = 
sender{mi^i) or sender {rrii) = receiver{mi^i), for all 1 < i < k; (c) round{mi) = round{mi-i) + 
1, for all 1 < i < k; (d) MsgGraph(mj) = uncertain, for all < i < k; (e) round{mk) = r, or 
MsgGraph(mfc) G {sent, not-sent, never-known}. 
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Algorithm 1: Component 1: Alg-MsgGraph for agent i 



1 initialize MsgGraphj such that all labels are uncertain 

2 live^ = all agents 

3 Phase I. Sending messages in round k: 

4 send MsgGraphj to all agents in livC' ~^ 

5 Phase II. Upon receiving messages sent to i in round k: 

6 livCi = { j G n : agent i received a message from j in round k} 

7 Let MsgGraph be the MsgGraph value that i receives from agent j in round k 

8 repeat 
Let m be a message from p to g in round r such that MsgGraphj(?Ti) = uncertain, for all p,q ^U. 
and I < r < k; update MsgGraphj(m) using the following rules 

1. label m as sent in MsgGraph^, if and only if 

(a) q = i and agent i received m in round r, or 

(b) p = i and r < A;, or 

(c) MsgGraph (m) = sent for some j ^ i. 

2. label m as not-sent in MsgGraph^, if and only if 

(a) q = i and agent i did not receive m in round r, or 

(b) r > 1, and for some j G 11, MsgGraphj((p, j, r — 1)) = not-sent, or // p fails to send 
a message in round r — 1 

(c) MsgGraph (m) = not-sent for some j ^ i. 

3. label m as never-known in MsgGraph^, if and only if 
(a) (i) r = 1 or for all j G 11, MsgGraphj((p, j, r — 1)) G {sent, never-known}, and 

(ii) every message chain of message m in MsgGraph^ ends at some message of status 
not-sent or never-known. // rule similar to 1(c) and 2(c) is not needed 
10 until no message labels can be changed by the above rules 



Algorithm [T] shows the pseudocode of Alg-MsgGraph. In each round, agents exchange and update 
their MsgGraph's. The update rule for agent i is summarized in line|9] Rules I and 2 for labeling message m 
as sent or not-sent is self-explanatory. Rule 3 for labeling a message m as never-known is more complicated. 
Essentially, what the rule says is that if i does not see that sender{m) has crashed in the round round{m) — 1 
(rule 3(a)(i)), and all possible message chains that could pass the status of m to i end up in lost messages 
(rule 3(a)(ii)), then i would never know the sent or not-sent status of m, no matter what happens later, 
and thus i labels m as never-known. We say that an agent learns the status of a message m if he updates 
the label of m to non-uncertain. These message labels are important for the second algorithm component 
to determine if an agent has obtained enough information to warrant a change of dictatorship. Agent i 
maintains the set of live agents up to round k in live^ , and only send messages to live agents. This is used 
by the second component as one mechanism to stop cheating behavior. 

3.1.2 Properties of Alg-MsgGraph 

In the following we list a series of lemmas that exhibit the properties of Alg-MsgGraph. For all lemmas 
in this section, we assume that every agent follows the algorithm until some round T, and except for the 
case of crashes, no agent terminates the algorithm voluntarily at or before round T, and all round numbers 
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mentioned in these lemmas are no larger than T. 

Definition 5 A failure pattern F is consistent with a message graph MGfor p at the end of round r, if in F 
the value of variable MsgGraph of agent p at the end of round r is exactly MG, i.e. MsgGraph ^ = MG. 

Lemma 5 Let mbe a message from ptoq in round r. Ifm is labeled as sent (resp. not-sentj by some agent 
i in its MsgGraph variable in a run with failure pattern F, then {p, q,r) £ F (resp. {p, q, r) F). 

Proof. Suppose m is label by agent i as sent in round k. By the algorithm, if i labels m according to rule 
1(c), we can always trace it back to the first update of m to sent by some agent j, such that the rule used 
is 1(a) or 1(b) in the algorithm. If agent j applies rule 1(a), which means that j = q receives m from p in 
round r, so {p, q, r) G F. If agent j applies rule 1(b), then j = p updates the label of m in the end of round 
k and r < A;, so p is alive until the end of round k, and [p, q, r) G F. 

Suppose now that m is label by agent i as not-sent in round k. If i labels m according to rule 2(c) then 
we can trace it back to the first update of m to not-sent by some agent j using rule 2(a) or 2(b). If j applies 
rule 2(a), then q = j does not receive m in round r, which means (p, g, r) F. If j applies rule 2(b), then 
for some j' G 11, {p, j',r — I) F, and thus by the constraint on F, we know that {p, q, r) ^ F. D 

Corollary 6 Let m be a message from p to q in round r. If a message m is labeled as sent (resp. not-sentj 
in message graph MG = MsgGraph^ ^ in a run with failure pattern F, then no agent can have the same 
message labeled not-sent (resp. sentj in any round in the run with the same failure pattern. 

Proof. If m is labeled as sent in message graph MG = MsgGraph^ ^ in a run with failure pattern F, by 
Lemma B] (p, g, r) G F. Then no agent can label m as not-sent in F because again by Lemma B] it would 
imply (p, q, r) F. The case of m is labeled as not-sent is argued symmetrically. D 

Lemma 7 A^o rules in Alg-MsgGraph can be applied to any message who already has a different non- 
uncertain status. 

Proof. Let m be a message from p to g in round r. Suppose agent i changes the status of message m 
in its MsgGraph variable from uncertain to a label x G {sent, not-sent, never-known} in round k. First by 
Corollary |6j we know that if re = sent (resp. not-sent), agent i cannot change m's label again to not-sent 
(resp. sent). By rule 3(a) we know that i cannot change the label to never-known if the label is already sent 
or not-sent. Therefore, the only case left to check is x = never-known and i changes the status of m to sent 
or not-sent. 

Suppose, for a contradiction, that agent i updates the status of m in a later round k' > k to sent or 
not-sent. Consider the case that i updates the status to sent first. Clearly, this update cannot be done by 
applying rule 1(a) and 1(b), because these two rules imply that agent i labels m to sent in round r, the round 
in which m is sent, and before this update m's label must be uncertain. Thus, i makes the update using rule 
1(c), which means i receives a message from j in round k' that contains the sent label for m. We can follow 
the message chain back until we find a agent ji who applies rule 1(a) or 1(b) to update m to sent. Hence 
we have a sequence of agents ji, J2, ■ . ■ ,jt = i, and a sequence of messages mi, m2, . . . , mt-i, such that 
(a) ji receives message mi-i from J£_i in round k' + £ — t, foT i = 2, 3, . . . , t; and (b) ji = p or q, and 
round A;' — t is round r in which message m is sent. By condition (a) above and Lemma [s] we know that 
on agent i the label of message mi can only be uncertain or sent, for all £ = 1, 2, . . . , t — 1. Now consider 
the particular round k when i updates m to never-known. Right before the update, m's label is uncertain. 
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and the message sequence m = rriQ, mi,m2, ■ ■ ■ , nix with k' + x — t = k of some prefix of this sequence 
forms a message chain of m by Definition |4] However, on this message chain, the label of the last message 
cannot be not-sent or never-known by the above argument, contradicting rule 3(a) used by i to update m to 
never-known. Therefore, we have a contradiction here. 

Now consider the case that i updates the status of m to not-sent in a later round k' > k. Similarly, this 
update cannot be done by applying rule 2(a), because this rule implies that before the update the status of 
m must be uncertain. For rule 3(b), it is in conflict with one of the condition in rule 3(a), and thus i cannot 
apply 3(b) either. Therefore, we can also trace back a message chain and find an agent ji = q who applies 
2(a) on message m, and all agents on this chain receives a message from their proceeding agent in the chain. 
The argument is then the same as the case above and we can show that agent i could not have updated m to 
never-known in round k if we have such a message sequence. This concludes our proof. D 

By the above lemma, the status of any message m in agent i's MsgGraph variable changes at most once 
from uncertain to not uncertain, and when it happens, we say that agent i learns the status of message m. 

Lemma 8 If a message m is labeled never-known in message graph MG = MsgGraph in a run with 
failure pattern F, then for any i G 11 that is alive at the end of round r in F, agent i cannot have the same 
message labeled sent or not-sent in any round in the run with failure pattern F. 

Proof. Suppose, for a contradiction, that there exists i G 11 who is alive at the end of round r, such that 
agent i labels m as sent or not-sent in a round r' . Let r" = max(r, r') + 1. By Lemma IvJ m's label on i 
at the beginning of round r" is still sent or not-sent. Let F' be a failure pattern such that failure behavior 
in F and F' are the same for the first r" — 1 rounds (i.e., {i,j, A;) G F if and only if {i,j, k) G F' for all 
i,j G n and k < r" — 1), and p is alive at the end of round r" and {i, p, r") G F' . Then in the run with F' , 
i will send its MsgGraph, which labels m to sent or not-sent, to p in round r" . According to rule 1(c) and 
2(c) of Alg-MsgGraph, p will update tti's label to sent or not-sent after receiving the round-r" message 
from i. However, since p already labeled m to never-known by round r < r", this contradicts to the result 
of Lemma |7j D 

Corollary 9 In any run o/ AlG-MSGGraPH and for any p, g G 11, m, G A^ and round r, such that both p 
and q are alive at the end of round r, and MsgGraph (m) ^ uncertain anJ MsgGraph (m) ^ uncertain, 
we have MsgGraphp,^(m) = MsgGraphq,^,(m). 

Proof. This is direct from Corollary |6] and Lemma [8] D 

Lemma 10 If agent q receives a message from agent p in round r, then all message status learned by p by 
the end of round r — 1 are learned by q by the end of round r. 

Proof. If a label p learned by the end of round r — 1 is sent or not-sent, then by rule 1(c) or 2(c), it is clear 
that q learns the label. We now claim that for all never-known labels p learned by the end of round r — 1, 
q learns all of them by the end of round r. Let ttiq, mi, . . . , m^ be the order of messages in which p learns 
their never-known labels. If p learns multiple never-known labels in the same round, the order is still the 
order in which p applies rule 3(a) in this round. We prove our claim by an induction on the message order. 
In the base case, when p learns the never-known label of tuq, all other learned labels are sent or not-sent. 
Thus when q receives p's message in round r, those sent or not-sent labels will be learned by q. If at this 
point the label of tuq on q is still uncertain, we can see that all conditions in rule 3(a) are satisfied, because 
they were satisfied on p when p learned the label (a possible difference is that the message chain of ttiq on 
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q may end at a never-known message while on p it must end at a not-sent message). Therefore, q will learn 
the never-known label of rriQ and the base case is correct. 

For the induction step, suppose that for messages mo, mi, . . . , m^, q learns their never-known labels by 
the end of round r. For message nis+i , p learns its never-known label based on sent/not-sent labels of other 
messages as well as the never-known labels of mo, m-i, . . . , mg, which can all be learned by q by the end of 
round r. Therefore, if the label of m^+i in q is still uncertain, rule 3(a) can be applied on q, and q will learn 
its never-known label at the end of round r. This completes the induction step. 

Therefore, we know that q will learn all the labels that have learned by p in the previous round, and the 
lemma holds. D 

Lemma W If a message m is labeled sent in MsgGraph /or some agent p and round r, then for all 
message status that sender (m) has learned before round round{m), agent p also learns them in round r. 

Proof. We prove this lemma by an induction on the round number r, with base case r = round{m). 

When r = round{m), since p updates m's label to sent by the end of round r, p must do so using either 
rule 1(a) or 1(b). If p uses 1(a), then p = receiver{m). By Lemma[To| p learns all the labels that sender{m) 
learns before round round{m). If p uses 1(b), then p = sender{m), and the statement is trivially true. Thus 
the base case is correct. 

For the induction step, suppose that the statement is true for rounds from round{m) to r, and we need to 
prove it for round r + 1. If p updates the label of m via rule 1(a), it must have done so in round round{m), 
and the statement holds by induction hypothesis. If p updates the label via rule 1(b), the statement is trivially 
true. Suppose now p updates the label of m via rule 1(c), in particular, p receives a message m' from q in 
round r + 1 in which m is labeled as sent. By induction hypothesis, by the end of round r q learns all the 
labels that sender{m) has learned before round round{m). Since p receives message m! from q in round 
r + 1, by Lemma [T0{ p learns all the message status that q has learned by the end of round r, which include 
all the message status that sender{m) has learned before round round{m). Thus the induction step is also 
correct, and the lemma holds. □ 

Lemma 12 For any agent i and round k, ifi learns the status of all messages of round k, then i learns the 
status of all messages before round k. 

Proof. Suppose, for a contradiction, that there exists some message before round k that agent i has not 
learned yet. Among all such messages, let m be the one in the earliest round, and let it be a message from p 
to q in round r < k. Since agent i has learned all messages of round k, every message chain of message m 
must end at some message of or before round k, with status either sent, not-sent or never-known. Consider 
the following cases: 

1. r > 1 and one of the messages from agent p in round r — 1 is labeled not-sent. According to rule 2(b) 
of Alg-MsgGraph, message m should be labeled as not-sent by i. 

2. There exists some message chain {m, rrii , . . . , ruk) of m on agent i such that the status of m^ is sent. 
By the definition of message chain of m, sender{mk) is either sender{mk-i) or receiver {mk_i), 
and thus sender{mk) learns the status ofruk^i by the end of round round{mk-i) = round{mk) — 1. 
By Lemma [TT| agent i learns the message status of m,fc_i. This contradicts the definition of message 
chain, which requires that the label of mk~i is uncertain. 

3. The rest cases. According to rule 3(a) of Alg-MsgGraph, message m would be labeled 
never-known. 
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Thus we reach a contradiction in every case, which proves the lemma. D 

Lemma 13 If no agents crash in round t and t + 1 for some t > 1, then for any correct agent i and a 
message m of round t, i learns the status ofm at the end of round t + 1. 

Proof. Let message m of round t be indexed as {p, q, t). 

If {p, q, t) G F, then p is ahve in round t and t + I, and p will label message m as sent at the end of 
round t (rule 1(b)) and i will receive this label from p in round t + I and label m as sent (rule 1(c)). 

If {p, g, t) ^ F, then p crashes in round i — 1 or earlier. If m is still uncertain on agent i at the end of 
round t + 1, consider a message m' that is the earliest round uncertain message on i at the end of round 
t + 1. Let m' be indexed as {p' , q' , r'), with r' < t. If r' > 1 and there exists a j such that {p' ,j, r' — 1) 
is labeled as not-sent on agent i, then i will label m' as not-sent (rule 2(b)). Thus consider either r' = 1 or 
for all j G n, {p',j, r' — 1) is labeled as either sent or never-known (cannot be uncertain by the selection of 
m'), so condition (i) of rule 3(a) holds. 

Consider any message chain of m', namely m' = ttt-q, m-i, • • • , w^fc, of agent i at the end of round t + 1. 
We argue below that either the chain ends with a label of not-sent or never-known, or rrik will be removed 
from the chain after labeling mfc_i on agent i. Let m^ be indexed as (p^, qk,rk)- 

• Case 1. ruk is labeled sent. Since m' = ttiq is labeled uncertain, we know A; > 1. By Lemmajs] pk is 
alive in round r^ — 1. Since pk = sender{mk-i) or receiver {nik-i), Pk learns the status of mk^i by 
the end of round rk — 1. By Lemma [TT] i learns the status of mk-i by the end of round t + 1, which 
means that mk will be removed from the message chain after learning the status of ruk-i- 

• Case 2. round{mk) = t + 1. In this case, since we know that m' is in round r' < t, we have A; > 1. 
\fpk is alive in round f + 1, then pk learns the status of mk-i at the end of round t, and pk will send 



a message to i in round t + 1. By Lemma 10 i will learn the status of m^^i, and thus m^ will be 
removed from the above message chain after i learns the status of nik-i- If Pk is not alive in round 
i + 1, then pk crashes before round t. Thus pk will not send a message to i in round t, causing i 
to label message {pk, i, t) as not-sent (rule 1(a)). Then at the end of round t + 1, i will label mk as 
not-sent. Hence in this case the chain ends with a label not-sent. 

• Case 3. ruk is labeled not-sent or never-known. This is what we want. 

We have exhausted all cases for a message chain. Our conclusion is that, if there are still message chains left 
after further labeling, all message chains end with labels not-sent or never-known. This satisfies condition 
(ii) of rule 3(a). Therefore, in this case, message m' would be labeled as never-known. 

We can repeat the above argument such that all uncertain messages in the earliest round will be labeled 
with sent, not-sent, or never-known, which implies that the status of message m will be learned by i at the 
end of round t + 1. D 

Corollary 14 Let t be the last round that some agent crashes in failure pattern F. For any correct agent 
i and any message m, i learns the state ofm in AlG-MSGGraPH in round no late than r = max{t + 
2, round{m) + 1} in the run with failure pattern F. 



Proof. If round{m) < t + 1, since no agent crashes in round t + 1 and t + 2, by Lemma 13 at the 
end of round t + 2, agent i learns the status of all messages of round t + 1. Thus by Lemma [12] agent i 
learns the status of message m. If round{m) > t + 1, because no agent crashes in round round{m) and 



round{m) + 1, again by Lemma 13 agent i learns the status of m at the end of round round{m) + 1. D 
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Lemma 15 For any message m, if there exists some round r > round{m), such that MsgGraph (?n,) 7^ 
not-sent/or all agents p that are alive at the end of round r, then no agents can label m as not-sent in any 
round later than r. 

Proof. Suppose by contradiction that there exists some message m such that the statement of the lemma is 
not true. If there are more than one such messages, we pick the one with the smallest round{m). Assume 
that some agent p labels m as not-sent in some round r' > r, and again we pick the smallest r' such that 
this holds. 

First it is easy to see that agent p cannot label m as not-sent by rule 2(a). If agent p learns it by rule 
2(b), this means there exists a message m' with round{m') = round{m) — 1, and MsgGraphp^j,(m') = 
not-sent. By the assumption on the minimality of round{m), the statement of the lemma is true for m! , 
which means there exists p' G 11, such that MsgGraphp/^(m') = not-sent, then by rule 2(b) we should have 
MsgGraph /^(?7i) = not-sent, which is a contradiction. Finally, if agent p learns the status of m by rule 
2(c), this means some agent learned this status in round r' — 1, again this contradicts the definition of r' . D 

Lemma 16 If agent i labels a message mfrom ptoq in round r as not-sent in its variable MsgGraph^, and 
either r = 1 or i labels all messages from p in round r — 1 as sent or never-known, then q must be alive at 
the end of round r. 

Proof. By the condition that either r = 1 or i labels all messages from p in round r — 1 as sent or 
never-known, we know that i does not apply rule 2(b) when labeling m as not-sent. If i applies rule 2(a), 
then it is clear that g = i is alive at the end of round r. Suppose now i applies rule 2(c), by receiving 
MsgGraph from j with MsgGraph (m) = not-sent. When j labels m to not-sent, it cannot be the case 
that r > 1 and there exists some j' such that j has labeled message from p to j' in round r — 1 as not-sent, 
because if so i would label this message as not-sent too when receiving MsgGraph ■ from j, but we know 
that i labels such messages either as sent or never-known, contradicting to Lemma|7] Hence j cannot apply 
rule 2(b) when labeling m. If j applies rule 2(c) when labeling m, we can repeat the above argument again, 
until we track back to an agent jo who applied rule 2(a) when labeling m as not-sent. This means q = jo is 
alive at the end of round r. D 

3.1.3 Components Alg-Dictator and Alg-Consistency 

In the second component, Alg-Dictator (shown in Algorithm |2]) maintains the current dictator of the 
system and decides the preferred value of the dictator when it is safe to do so. Component Alg-Dictator 
runs in parallel with Alg-MsgGraph, which means that (a) when an agent i sends a message in Alg- 
DlCTATOR, the message would be piggybacked together with the message sent in Alg-MsgGraph; (b) 
Alg-Dictator reads some variables maintained in Alg-MsgGraph, in particular, MsgGraph^, and 
live^; and (c) after receiving messages in each round, every agent first runs Phase II of Alg-MsgGraph 
and then runs Phase II of Alg-Dictator. 

In Alg-Dictator, initially all agents set agent 1 as the default dictator (line[T]l. The current dictator d 
sends out his most preferred value v^ in a NEWEPOCH(t;(i) message to all agents (line [4]). If dictator d is 
still alive at the end of the round, he simply decides on Vd (linefTO]), and sends one more round of messages 
before terminating the algorithm (line [8]). 

If dictator d crashes before he decides, other agents rely on their MsgGraph's to determine if they still 



decide on d's most preferred value or switch to a new dictator (lines 11 -20l. If an agent i finds out that all 



messages from dictator d in the round when d sends the NEWEPOCH messages are labeled either sent or 
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Algorithm 2: Component 2: Alg-Dictator for agent i with most prefen^ed value vi 

1 dictatori = 1 

2 Phase I. Sending messages in round k: 

3 if dictatori = i and agent i has never decided a value before then 

4 \_ Send NEWEPOCH(ui) to all agents in live\ 

5 Phase II. Upon receiving messages sent to i in round k: 

6 // First apply Phase II of Alg-MsgGraph to process MsgGraph messages received in this round 

7 if agent i decided a value in round k — 1 then 

8 |_ terminate all components of Alg-NewEpoch 

9 if dictatori = i then 

10 I decide Vi; end Phase II 

11 repeat 



12 
13 
14 

IS 
16 
17 

18 
19 



if received NE\NEPOCh\{v) from dictatori at round k' < k and 
Vj G n, MsgGraphj((iictatorj, j, A:') G {sent, never-known} then 
I decide v, the most preferred value of dictatori ; end Phase II 

else if dictatori ^ /wef then 

r = min{r' | 3j G 11, MsgG raphj( dictator j,j,r') = not-sent} 
if fr = 1 or Vj G 11, MsgGra phj( dictatori, j,r — 1) / uncertainj and 
Vj G n, MsgGraphj( dictator j,j,r) / uncertain then 
1^ dictatori = min{j G 11 | MsgGra ph j (dictotorj, j, r) = not-sent} 



20 until dictatori does not change in the current iteration 



never-known, he will decide on Vd (line 14 1. This is because algorithm Alg-MsgGraph guarantees that in 
this case no alive agent can detect the crash of the dictator and it is indeed possible that the dictator already 
decides from the point of view of live agents. If instead i detects that the dictator d crashes before sending 
out all NEWEPOCH messages, i finds the first round r in which some message from dictator d is labeled 



not-sent (line 16 1, and if all messages from d in round r — 1 and r have non-uncertain labels, i switches the 



dictator from d to a new d' who has the smallest id among all agents not receiving messages from d in round 



r (lines 18- 19 1. The change of dictatorship indicates that a new epoch starts, and agents repeat the same 
procedure above in determining whether to follow the current dictator or switch to a new one. 

Component Alg-Consistency (shown in Algorithm [3]l is for agent i to detect any inconsistency in 
the run due to manipulations by coUuders. It is run at the end of each round after the Phase II of both Alg- 
MsgGraph and Alg-Dictator have completed. The message history collected on agent i is inconsistent 
if it cannot be generated by any valid failure pattern when following the protocol. Alg-Consistency 
avoids enumerating all failure patterns by constructing one plausible failure pattern F' from IVlsgGraphj 
and then simulating the run of Alg-MsgGraph and Alg-Dictator only in this failure pattern. Once 
inconsistency is detected, agent i decides a special value T not in the set V of possible proposals, which 
violates Validity of consensus. This acts as a punishment strategy to deter coUuders. 
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Algorithm 3: Component 3: Alg-Consistency for agent i with most preferred value vi 

1 At the end of each round k II after Phase II of Alg-MsgGraph and Alg-Dictator: 

2 Let mhisti[l.-A;] be the message history of agent i. 

3 Construct a failure pattern F' such that for each message m indexed by (p, g, r), (p, q, r) ^ F' if and 
only if MsgGraphj(m) = not-sent. 

4 if 3p, q, q' , r, s.t. {p, q, r) ^ F' f\ {p, q' ,r — 1) ^ F' or number of crashes in F' is larger than f then 

5 IIF' is not a valid failure pattern 

6 decide T (T ^ V); terminate all components of Alg-NewEpoch 

7 Let v' = {v'l = -L, v'2 = -L, ■ . . , v'^ = Vi, . . . , v'n = _L). // vector of simulated most preferred values 

8 foreach j ^ i such that i has received a NEWEPOCH(f j) message from j do 

9 I Let Vj = Vj 

10 Simulate Alg-MsgGraph and Alg-Dictator with failure pattern F' and vector v' of most 
preferred values up to round k. 

11 Let mhist'j[l..A;] be the message history for agent i in the simulation with F' and v'. 

12 if mhisti[l..A:] / mhist^[l..A;] then 

13 I decide T; terminate all components of Alg-NewEpoch 



3.1.4 Proof of Alg-NewEpoch (Alg-MsgGraph + Alg-Dictator + Alg-Consistency) 

We now proof that Alg-NewEpoch is a correct (2, /)-resilient consensus protocol for any / < n — 2. In 
our analysis, we first show that for any / < n — 1, in order to manipulate Alg-NewEpoch, some cheater 
has to either pretend a crash failure or fake the receipt of a message he does not receive, since most other 
cheating actions are deterred by consistency checks. Then when / < n — 2, pretending a crash is prevented 
by the second condition in line |4] of Alg-Consistency, because it is possible that the total number of 
failures exceed / if a cheater pretends a crash, which could be detected by an honest agent. However, 
guarding against faking messages is much more subtle, which relies on NEWEPOCH messages and the 
way we change dictatorship. To do so, we carefully design the conditions for an agent to claim dictatorship 
in Alg-Dictator, among which one important condition is that a new dictator has to be among agents who 
do not receive the NEWEPOCH messages from the previous dictator, which reduces the incentive for an 
agent to fake the message. These conditions together with the properties of Alg-MsgGraph guarantee that 
the cheater has to fake the status of a NEWEPOCH message, and the cheater has to crash for his colluding 
partner to benefit. Finally, we consider an alterative run in which the cheater who fakes the message is alive 
but his partner dies at the same time he fakes the message. We show that in this case, the lone cheater cannot 
get the most preferred value from the final dictator, the one of whom the cheater fakes the message, and thus 
the cheater has the risk of not able to decide in a run. 



For Lemmas 17 to 21 we assume that every agent follows the algorithm until some round T, and except 
for the case of crashes, no agent terminates the algorithm voluntarily at or before round T, and all round 
numbers mentioned in these lemmas are no larger than T. 

For any agent i, dictator i may change during the run. We define dictator chain of agent i to be the 
sequence of dictator values occurred in variable dictatori of agent i. Note that the first dictator in the 
dictator chain is 1 according to line[T] 

Lemma 17 Let Dq = 1 , Di , D2 , . . . , Dt be a dictator chain on some agent i. For every pair of consecutive 
dictators D^-i and Di in the chain, there exists a message mi from D^-i to Di in some round r^, such that 
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(a) mi is labeled as not-sent on agent i, (b) -D^-i crashes by round r^, (c) D^ is alive at the end of round r^, 
(d) no labels of messages from D^-i in round rg — 1 is labeled not-sent; and (e) ri < r2 < ■ ■ ■ < rt- 

Proof. When i changes the dictator from D^_i to D^, it does so because the message m from D^_i to D^ 



in some round r is labeled not-sent (line 19 1. Let mi = m, and r^ = r. Condition (d) in the statement is 



true because r^ is the earliest round in which the label of some messages from -Df_i is not-sent (line 16 and 



first condition of line 18 1. By Lemma [5} D^-i crashes by round r^. Moreover, we know that either r^ = 1 



or messages from Di_i in round r^ — 1 is labeled sent or never-known (implied by line 16 and the first 



condition of line 18 1. By Lemma 16 we know that Di is alive at the end of round k. Since Di crashes by 



round r^+i, we know that r^ < r^+i- D 

Corollary 18 Any agent can appear at most once in any dictator chain. 



Proof. Immediate from Lemma 17 and the fact that no agent becomes alive again after a crash. D 



Lemma 19 Let Dq = 1, 1?i, Z?2, ■ ■ ■ ,Dt and D'q = 1, D[, D2, ■ ■ ■ , D'^, be two dictator chains on agents i 
and j respectively, at the end of some round k when they are both still alive in the run. Then one chain is a 
prefix of the other chain. 

Proof. We prove the result by an induction on the length of the shorter chain. The base case of Dq = D'q 
is already given. Suppose that Di = D'^. When agent i changes the dictator from Di to -D^+i, it finds a 



minimum round r in which a message from Di has label not-sent (line 16 1 and no message from Di in 
round r — 1 or r is labeled uncertain (line 18 1. Similarly, when agent j changes the dictator from D'^ to 



L>^_i_^, it finds such a minimum round r'. We claim that r = r' . If not, suppose without loss of generality 



that r < r'. By line 16 on agent i message from Di to D^+i in round r is labeled as not-sent by the end of 
round k. By rule 2(b) of Alg-MsgGraph, message from De to -D^+i in round r' — 1 is labeled as not-sent 
on agent i by the end of round k. By Corollary [6] and Lemma [8} agent j cannot label the message Di to 
D^_l_i in round r' — 1 as sent or never-known by the end of round k. Since agent j selects a new dictator 



D'e+i by the end of round k, by the first condition in line 18 of Alg-Dictator, j cannot label the message 
Di to DiJ^i in round r' — 1 as uncertain either at the end of round k. Thus, j must have labeled this message 
as not-sent, but this contradicts the fact that r' is the minimum round in which some message from D^ is 
labeled as not-sent on agent j. Therefore, we have r = r'. Finally, since both i and j are alive by the end 
of round k, by Corollary [9} i and j have the same non-uncertain labels for all messages from D^, and thus 
they must have selected the same dictator D^j^i. □ 

Note that the requirement that both i and j are alive in round k cannot simply be removed, and here is 
a simple counter-example if it is removed. Consider a system of four agents. In round 1 agent 1 crashes 
and fails to send messages to agents 2 and 4. In round 2, agent 2 crashes and fails to send a message to 4. 
In round 3, agent 3 crashes and fails to send a message to 4. All other messages are successfully sent. We 
can check that at the end of round 2 agent 3 will change the dictator from 1 to 2, but at the end of round 
3 agent 4 will change his dictator from 1 to 4, because he will label the message from 1 to 2 in round 1 as 
never-known at the end of round 3. Therefore, the dictator chain 1, 4 on agent 4 at the end of round 3 and 
the dictator chain 1, 2 on agent 3 at the end of round 2 are not prefix of each other. 

Corollary 20 No two agents can send NEWEPOCH(-) messages in the same round. 

Proof. Suppose, for a contradiction, that both agents i and j send N EWEPOCH (•) in the same round k. By 
the algorithm, it is clear that k cannot be 1, and by the end of round A; — 1, i is the last dictator in the dictator 
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chain on i, and j is the last dictator in the dictator chain on j. By Lemma 19 either i appears in the chain of 



j or the reverse is true. Suppose i appears in the chain of j. By Lemma 17 i must have crashed by a round 



r and r < k — 1, which contradicts to our assumption that i sends out NEWEPOCH(-) in round k. D 

Lemma 21 If agent j receives a message from agent i in round k + 1, then the dictator chain of agent i at 
the end of round k is a prefix of the dictator chain of agent j at the end of round k + 1. 



Proof. By Lemma [T9| at the end of round k, either the dictator chain of i is a prefix of dictator chain of j, 
or the reverse is true. If the dictator chain of i is a prefix of dictator chain of j at the end of round k, then of 
course it is also a prefix of the dictator chain of j at the end of round A; + 1. Suppose now that the dictator 
chain of j at the end of round fc is a prefix of the dictator chain of i. By Lemma [TOJ at the end of round 
k + I j learns the status of all messages that i learn at the end of round k. Then the condition that causes i 
to change dictators by round k would also cause j to change dictators at the end of round k + 1. Therefore, 
j will change the dictators exactly as i's chain, and make its dictator chain at least as long as i's at the end 
of round k. Thus the lemma holds. D 

Lemma 22 Suppose that all agents are honest. When an agent terminates the algorithm at line^of Alg- 
DlCTATOR in some round k, all agents that are still alive in round k must all have decided by the end of 
round k. 

Proof. Let agent i be the first agent to terminate the algorithm from line [8] of Alg-Dictator, in some 
round k. Let D be the last dictator on agent i who sends out NEWEPOCH message at some round k'. 
Suppose agent i decides on D's most preferred value in round k". According to algorithm, agent i terminates 
the algorithm at the end of round k" + 1, i.e. k = k" + 1. Note that no agent terminates the algorithm before 
round k, which means all lemmas from Lemma [5] to Lemma 21 hold by the end of round k. Then for 



any agent p that are still alive in round k, he must have received agent i's MsgGraph variable in round 
k. By Lemma [2T| agent p must have agent i's dictator chain as a prefix at the end of round k. Because 



MsgGraphj(Z), j, k') £ {sent, never-known} for all j G 11, by Lemma 10 these status are also learned by 
agent p by the end of round k. According to the algorithm, agent p will decide on agent D's most preferred 
value by the end of round k. D 

With Lemma [22] we know that no agents voluntarily terminate the algorithm before all other agents 
decide. Therefore, the voluntary termination of the algorithm by any agent does not affect the decisions of 
the agents, and we can apply all previous lemmas until the termination of the algorithm. 

Lemma 23 Suppose that all agents are honest. No agent can reach line^in Alg-Consistency and 
decide T. 

Proof. First it is easy to see that when all agents are honest, the F' constructed in Alg-Consistency 
must be a valid failure pattern, i.e., if any agent i fails to send a message to some agent in F', i will not send 
any message to any agent in the next round. Thus the first condition in line [6] cannot be true at any time. 



According to the algorithm, once an agent decides through line[TO|or line[T4|of Alg-Dictator, he will 
terminate the algorithm before reaching line[6]in Alg-Consistency in the next round. Thus it suffices to 
prove that no agent can have | livei \ < n — / at any round when or before he decides. 

Suppose that some agent p has |/iwe* | < n — / at some round t. And among all such cases, we pick 
the one with the smallest round number t. This means that there are more than / agents who do not send 
messages to agent p in round t. Since at most / agents can crash in a run, there must be some agent i who 



terminates the algorithm at line ^ in some round earlier than round t. Then by Lemma 22 all other alive 



agents have also decided before round t. Thus the lemma holds. D 
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Lemma 24 Suppose that all agents are honest. No agent can reach line 13 in Alg-Consistency and 
decide T. 

Proof. Given any failure pattern F and most preferred value vector v, for any agent i and any round 
k. Let F' and v' be the failure pattern and most preferred value vector that agent i constructed in Alg- 
CONSISTENCY. Let MsgGraph and MsgGraph' be the variables in the run with {F, v) and (F', v'), respec- 
tively. 

First, for any j € 11 and r < k, because message {j,i,r) is labeled either sent or not-sent 
in MsgGraphj^, according to the construction rules at Step |3J we know {j,i,r) € F if and only if 
{j,i,r) G F'. In the following we will prove that for any {j,i,r) G F and any message m, we have 
MsgGraph^- j,(m) = MsgGraph^-,,(m). 

• If MsgGraphj^,(m) = sent, which implies that MsgGraphj^(?n,) = sent. Hence m ^ F and m G F' . 
If in the run with failure pattern F, this status is labeled by agent j according to rule 1(a) or 1(b) of 
Alg-MsgGraph, then j can also label it as sent using the same rule in MsgGraph^^ with F' . If 
j labels it by rule 1(c) with failure pattern F, we can follow the message chain back until we find 
some agent j' who applies rule 1(a) or 1(b) to update m to sent. Then with failure pattern F' , j' 
will apply the same rule to label m to sent, and every agent in the message chain will label m as 
sent, as they did with F, and eventually j will label m as sent in MsgGraph'^. On the other hand, if 
MsgGraph' ,.(m) = sent, using a similar argument, we can show that MsgGraph ,.(m) = sent. This 
means that MsgGraph^ ,,(m) = sent if and only if MsgGraph' ,^(?7i) = sent. 

• Using the similar argument as in previous case, we can show that MsgGraph ,,(m) = not-sent if and 
only if MsgGraph^ j,(m) = not-sent. 

• Let M be the set of all messages with status never-known in MsgGraph^^ with failure pattern F, 
sorted by the time of them being labeled. For any m G M, if all messages before it in M are all 
labeled never-known in MsgGraph'^ with failure pattern F' . Then rule 1(c) can also be applied to this 
message rrii so that it can be labeled as never-known in MsgGraph'^ with failure pattern F' too. And 
by induction, we have MsgGraph' j.{m) = never-known for all m G M. Again, by a similar argument, 
one can show that if MsgGraph^ ^.(m) = never-known, we have MsgGraph^ ^(m) = never-known. 

Finally, having the above three results, we can directly have that for any message m, MsgGraph ,,(?Ti) = 
uncertain if and only if MsgGraph' j,(m) = uncertain. Hence, we conclude that all the MsgGraph variables 
that agent i receives in each round are the same with failure pattern F and F' . 

Notice that whether an agent sends out a NEWEPOCH message in some round only depends on his Msg- 
Graph variable in that round. Also if agent i receives NEWEPOCH(wj) from some j with most preferred 
value vector v, we must have f' = v in I?' too, which means if j sends out a NEWEPOCH message with 
most preferred value vector v' , it should also be NEWEPOCH(wj). Thus agent i will receive the same set of 
NEWEPOCH messages with {F,v) and {F',v'). Hence, we can conclude that mhist[l../c] = mhist'[l..A;]. 
This completes the proof of this lemma. D 

Lemma 25 (Termination) Suppose that all agents are honest. Every correct agent eventually decides. 

Proof. Suppose, for a contradiction, that some correct agent i does not decide. By Corollary [18} an agent 
appears at most once in the dictator chain of i, so eventually the dictator on i does not change any more. 
Suppose the last dictator on i is D. Since dictator only has a finite number of changes, it is clear that agent 
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i will not loop forever in the repeat-until loop (lines 1 1 - 20 1 in any round. Since i does not decide, we know 
that D ^ i, otherwise, i would decide in line[TOJ 

Suppose first that D ^ live^ for some round k. Note that the reason that D ^ live^ could be either that 
D crashes, or that D has terminated its Alg-MsgGraph and Alg-Dictator tasks, but we do not need 
to distinguish these two cases here. Then MsgGraphj^(D, i, k) = not-sent. By Corollary fT4l i eventually 



learns the status of all messages by round k. According to lines [TO and 18 i would change the dictator in 



line 19 contradicting to the assumption that D is the last dictator on i. 

Now suppose that D G live'l for all rounds k. This means that D is a correct agent and D does not 
terminate his Alg-MsgGraph and Alg-Dictator tasks. Therefore, D will receive a message from i 



after i already fixes its dictator chain. By Lemma 21 after D receives this message, i's dictator chain will 



become a prefix of D's dictator chain, which means that D is in D's dictator chain. Then after D sets itself as 
the dictator, D must successfully send N EWEPOCH messages to all live agents in a round r and then decide 



in line[TOj Agent i must have received this N EWEPOCH message in round r from D. By Corollary [14 
eventually i learns the status of all messages from D in round r. Since D is a correct agent, for all j G IT, 
{D, j, r) G F, where F is the failure pattern of the run. Thus, by Lemma[5| i cannot label any message from 
D in round r as not-sent. According to the condition of line [13} i will decide D's most preferred value (i 
knows this value because i receives D's N EWEPOCH message containing the value). This contradicts our 
assumption that i does not decide. 

We have discussed all cases, all of which lead to a contradiction. Therefore, the lemma is correct. □ 

Lemma 26 (Uniform Agreement) Suppose that all agents are honest. No two agents (correct or not) de- 
cide differently. 



Proof. First, since all agents are honest, by Lemma 23 and Lemma 24 no agent decides T in Alg- 
DlCTATOR or Alg-Consistency. Thus, no matter whether an agent decides in Une[lO|or line [T4| of 
Alg-Dictator, the agent always decide the most preferred value of the current dictator. 

Let agent i be the agent that decides in the earliest round among all agents. Suppose agent i decides 
at the end of round r. And let D be the last dictator on i who sends out NEWEPOCH message in round 
k < r.By algorithm we know that MsgGraphj ri^^J^ ^) ^ {sent, never-known} for all agents j. 

For any agent p ^ i who is still alive at the end of round r (otherwise he cannot decide according to 
our assumption), consider the message {D,p,k). First we know MsgGraphp, ,.{D,p,k) ^ never-known, 
since /c < r so either p receives the message from D in round k and label it as sent, or p does not receive 
the message and label it as not-sent. Second, because MsgGraphj ^(D,p, fc) G {sent, never-known}, by 
Corollary [6| and Lemma[8} agent p cannot have message (D, p, k) labeled not-sent. Thus agent p must have 
received agent D's NEWEPOCH message at the end of round k. Notice that agent D must have labeled 



herself as the last dictator when sending out the NEWEPOCH message. By Lemma 21 agent p must 
also have agent D in the dictator chain after he received D's message at the end of round k. Again since 
MsgGraphj ^(D, j, k) G {sent, never-known} for all agents j, by Corollary [6 and Lemma[8J none of these 
messages can be labeled not-sent in agent p's MsgGraph variable in any round. Thus agent p cannot change 
his dictator from agent D to any other agents, which means if he decides, the decision value must be agent 
D's most preferred value too. D 

Lemma 27 Alg-NewEpoch solves consensus problem if all agents are honest. 



Proof. Validity is trivial. Termination and Uniform Agreement are proven by Lemmas 25 and 26 respec- 



tively, n 
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We say that a cheater pretends a crash if he stops send messages to some honest agents in a round r 
and then stop sending all messages to all honest agents in all rounds after r. We say that a cheater /a^e^ a 
message if he does not receive a message from an agent p in a round r — 1 but he labels this message as sent 
in his message to all honest agents in round r. The following lemma applies to any size of colluding groups, 
and shows that pretending a crash and faking a message are something a cheater has to do if he wants to 
manipulate the system. 

Lemma 28 If any group of cheaters can strategically manipulate the protocol Alg-NewEpoch in a run, 
then some cheater must either pretend a crash or fake a message in the run. 

Proof. By our model, a cheating agent i may change his algorithm Ai, which given a round number r, a 
message history mhist[l..r], and his private type Oi, outputs the messages smsgs to be sent in the next round 
r + 1 and a possible decision value d (perhaps _L). Thus i would either change the output d or the messages 
smsgs to be sent. 

We prove the lemma by the following case analysis on the possible cheating behavior of the cheating 
agents. 

• Case 1. No cheater changes the message output in any round, and only some cheating agent changes 
the decision output d. Suppose a cheater i is the first who changes his decision output d at the end of 
round r. 

- Case 1.1. By the end of round r, some honest agent already decides. In this case, i cannot 
change the decision value, since otherwise it would violate Uniform Agreement of consensus 
specification. 

- Case 1.2. Some honest agent j is in livel, i.e., i still receives a message from j in round r. 
Then there exists a failure pattern extension consistent with what i observes so far in which j 
is a non-faulty agent. By the Termination property, j must decide a value dj. By agreement, 
all cheaters including i may only decide the same value dj, so we have d = dj, that is i has to 
decide on the value dj too. Since no cheater changes their message output of their algorithms, 
agent i would decide dj if he follows the protocol, therefore i cannot benefit by changing his 
decision output in this case. 

- Case 1.3. No honest agent decides and all honest agents fail to send a message to i in round r. 
In this case, at the end of round r, if dictator i is still an honest agent, according to lines [T5| - 



19 i would change the dictator to be one of the cheaters. Thus, i would eventually decide on 
cheaters' most preferred value (their preferred values are all the same by our model assumption) 
if i follows the protocol, so i cannot benefit by deviating from the protocol. 

By the above argument, we know that Case 1 cannot happen if cheaters manipulate the system and 
benefit. 

Case 2. At least one cheater changes some round message. If cheaters only change messages to other 
cheaters, then they do not affect the behavior of honest agents, and following the same argument as in 
Case 1 they will not benefit with such cheating actions. Thus, suppose cheater i changes his round r 
message to another honest agent j. 

- Case 2.1. Agent i is supposed to send a message to j in round r but it drops this message. If i 
does not receive j's round-r message, then either j crashes or j already terminates his algorithm. 
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In the former case, i drops a message to a crashed agent and thus it has no effect to the protocol 
outcome. In the latter case, j already decides and i cannot change the decision anyway. Thus 
i does not benefit if i does not receive j's round-r message. Now suppose that i receives j's 
round-r message. In this case, i has to stop sending messages to all honest agents in round r + 1. 
Otherwise, if i still sends a message to some honest agent p in round r + 1, it is possible that j 
would also send a message to p in round r + 1 in which j would label the round-r message from 
i to j as not-sent (by rule 2(a)). Then p would receive a message from i in round r + 1, and at the 
same time p knowns from agent j that i did not send j a message in round r. Since there is no 
failure pattern in which both of these two things happen at the same time, which means agent p 
will detect inconsistency in Alg-Consistency and decide T, violating Validity of consensus. 
By the same argument, i has to stop sending all messages to all honest agents in any round after 
round r. That is, i must pretend a crash, which matches the first case covered in the statement of 
the lemma. 

Case 2.2. Agent i is supposed to send a message to j in round r, but i changes the message into 
a wrong format. This will cause j to detect inconsistency in Alg-Consistency and decide T, 
violating Validity of consensus. Thus this case cannot occur. 

Case 2.3. Agent i is supposed to send a NEWEPOCH message in round r to j but it does 
not send it (it still sends the MsgGraph message), or it is not supposed to send a NEWEPOCH 
message but it sends such a message. Since whether to send a NEWEPOCH message or not 
can be derived from the MsgGraphj variable of agent i at the beginning of round r, this cheating 
behavior cannot be act alone, otherwise j will detect an inconsistency in Alg-Consistency 
and decide T, violating Validity of consensus. Thus the cheaters must also cheat in some other 
way. 

Case 2.4. Agent i is supposed to send j NEWEPOCH(t;) in round r but instead he sends 
j NEWEPOCH(t;'). If agent i consistently changes all his NEWEPOCH(i;) messages to 
NEWEPOCH (f'), this is equivalent of i changing his private type, if there is no other cheat- 
ing actions combined. However, when fixing any failure pattern, it is clear that our protocol is 
a dictatorship protocol, meaning that it always decides on some agent's most preferred value. 
Thus cheaters cannot benefit by changing their private type in a dictatorship protocol. If agent 
i changes his NEWEPOCH(t;) messages inconsistently, namely sending different values to dif- 
ferent honest agents, then if it is not combined with other cheating actions (such as pretend a 
crash or fake a message label), it is possible that i is correct, and two honest agents would decide 
two different values, violating Uniform Agreement of consensus. 

Case 2.5. Agent i changes some message labels in the MsgGraph he sends to j in round r. 

Without loss of generality, we could assume that this is the earliest label-cheating action among 

all such label cheating actions. Among all message labels that i cheated, let m be the message 

of the latest round. 

* Case 2.5.1. Message m is of round r — 2 or earlier. Then m's label sent by i in round r — 1 

must be uncertain, because otherwise i changes m's non-uncertain label from round r — 1 

to round r, and j would detect the inconsistency and decide T. The fact that i labels m as 

uncertain at the end of round r — 2 implies that i is neither the sender nor the receiver of 

m, because otherwise i has to label m either sent or not-sent by the end of round r — 2 

according to the algorithm. 

Suppose sub-case A is that i is supposed to label m as some non-uncertain label at the end 
of round r — 1, but i changes the label. That means i got enough information in round r — 1 
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allowing i to apply rule 1(c), 2(b), 2(c), or 3(a) of Alg-MsgGraph on m. However, in 
this case, it is possible that agents who provide these information also provide the same 
information to j (recall that no one cheat message labels in round r — 1), and also j receives 
correct labels from i in round r — 1 mapping to the MsgGraph state of i in the end of round 
r — 2, so j could have label m to the same non-uncertain label at the end of round r — 1. 
Thus, i cannot change m's label to different non-uncertain label, since otherwise j would 
detect inconsistency. If i cheats m's label to be uncertain, then it must also pretend that it 
does not receive enough information in round r — 1, which means i has to cheat on labels 
of some messages sent to i in round r — 1, but this contradicts our assumption that m is the 
latest round message that i cheats on. 

Suppose now the sub-case B is that i is supposed to label m as uncertain at the end of 
round r — 1, but i cheats the label to some non-uncertain label. Since j could have received 
the same information as i received in round r — 1, i has to pretend that he receives more 
information from another agent that i actually does not receive a message from, to avoid j 
detecting an inconsistency. However, this means that i also needs to cheat the status of a 
message in round r — 1, contradicting to our assumption that m is the latest round message 
that i cheats on. 

The above shows that Case 2.5.1 is not possible. 
* Case 2.5.2. Message m, is of round r — 1. If i is the sender of the message, i has to follow 
the algorithm and label m as sent in his round r message, because any agent can detect 
consistency if he labels m to something else. If i is not the sender nor the receiver of the 
message, i has to follow the algorithm and label m as uncertain, again because any agent 
can detect consistency otherwise. Thus let i be the receiver of m. 

Consider first that i receives m, but cheats m's label as not-sent. If sender{m) is an honest 
agent, it is possible that sender{m) is able to send a round r message to j, and then j will 
detect an inconsistency. If sender {m) is also a cheater, it has to drop the message to j in 
order to avoid inconsistency. This goes back to Case 2.1, as we conclude that sender{m) 
has to pretend a crash. 

Finally, consider that i does not receive m, but cheats ?n,'s label as sent in his message to 
j in round r. In this case, i has to consistently send the sent label of m to all live and 
honest agents in the round r, because otherwise, two honest agents may exchange message 
in round r + 1 and detects inconsistency in the labeling of m (some labels m as sent while 
others labels m as not-sent). This is exactly the faking message case in the statement of the 
lemma. 

We have exhausted all cases, and show that in all runs some cheater has to either pretend a crash or fake a 
message to in order to benefit. D 



Note that Lemma 28 does not preclude cheaters to use other cheating actions such as deciding earlier, 
but it dictates that cheaters have to combine other cheating methods with pretending a crash or faking a 
message to be successful. With this lemma, to show that the protocol is collusion-resistant, it is enough to 
show that these two cheating actions cannot occur in any run. 

Lemma 29 In Alg-NewEpoch, no group of agents of size at most two can strategically manipulate pro- 
tocol by some cheater pretending a crash, when f < n — 1. 



Proof. By Lemma 28 we only need to show that cheaters cannot pretend a crash or fake a message. We first 



show that cheaters cannot pretend a crash. Suppose, for a contradiction, that a cheater i pretends a crash in 
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round r by not sending a message to an honest agent j. Suppose that there are £ agents crashed before round 
r. In round r, we can crash another f — £ agents including the other cheater so that none of them sends 
messages to agent j (meaning finding another failure pattern that satisfies these conditions). For agent j, it 
will detect at the end of round r that \live^i\ < n — f and decide T (hne[6]in Alg-Consistency). This 
means consensus is violated and the strategy profile is not legal. Thus cheaters cannot pretend crashes when 
f <n-l. D 

Lemma 30 In Alg-NewEpoch, no group of agents of size at most two can strategically manipulate pro- 
tocol by some cheater faking a message. 

Proof. Suppose, for a contradiction, that cheater i fakes a message in round r by labeling a message from 
p in round r — 1 as sent while i does not receive this message, and sending this label to all honest agents. 
Without loss of generality, we assume no faking message behavior by any cheater has occurred in round 
r — 1 or earlier. We prove this case through the following series of claims. 

Claim 1. In the run where i fakes the round- (r — 1) message from p to i in round r and some cheater 
benefits from this cheating behavior, p must be in the dictator chain of any agent who decides. 

Proof of Claim 1 . Suppose that p is not on the dictator chain. If i does not fake the message, the dictator 
chain would remain the same, and thus no cheater can benefit from this cheating behavior, a contradiction. 

Claim 2. In some run in which i conducts the above cheating action, p is alive at the end of round r — 2. 

Proof of Claim 2. Since i pretends that p sends a message to i in round r — 1, and this cheating behavior 
does not cause any agent to detect inconsistency, there must be a run in which p indeed is alive at the end of 
round r — 2 and sends a message to p in round r. 

Henceforth, we consider a run R in which p is alive at the end of round r — 2, i fakes the round-(r — 1) 
message from p to f in round r, and some cheater benefits from this cheating behavior. 

Claim 3. In the MsgGraph sent out by i in round r, all round-(r — 2) messages addressed to p are labeled 
sent or not-sent, and the labels match the failure pattern of run R. 

Proof of Claim 3. If i would successfully receive the message from p in round r — 1, this message would 
contain sent or not-sent labels of all messages addressed to p in round r — 2, and i cannot fake these labels 
in his round-r message because p may have successfully sent these labels to other honest agents, who might 
be able to detect inconsistency if i does so. Note that p itself could be a cheater, but by our assumption p 
does not cheat in round r — 1. 

Claim 3 means that at the end of round r — 1 agent i knows the status of all messages addressed to p in 
round r — 2. 

Claim 4. For any agent q, if p does not receive g's message in round r — 2, then q must have failed to 
send out some message in round r — 3, and i labels this message as not-sent at the end of round r — 1. 

Proof of Claim 4. By Claim 3, agent i would label the message from g to p in round r — 1 as not-sent, 
which correctly matches the failure pattern. Thus, according to Algorithm Alg-MsgGraph, if i were an 
honest agent, i could only apply rule 2(b) for this message, which means some message from q in round 
r — 3 or earlier is labeled not-sent by i, and indeed q fails to send this message. 

Claim 5. For any agent q, if p does not receive q's message in round r — 2, then at the end of round 
r — 2, p must have labeled some message from q in round r — 3 and all messages from q in round r — 2 as 
not-sent. 

Proof of Claim 5. By Claim 4, q must have failed sending a message m in round r — 3 and i labels m as 
not-sent by round r — 1. If agent i learns this label at the end of round r — 3, then i would pass this label to p 
in round r — 2. If agent i learns this label in round r — 2 or later, then this label must be passed to i through 
a chain of messages. Let message from x to y in round r — 2 be a message on this chain. Then agent x must 
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have also passed the not-sent label of m to p in round r — 2. This is because, if x fails to send a message 
to p in round r — 2, by Claim 4 x should have crashed by round r — 3 and thus cannot send a message to y 
in round r — 2. Therefore, p would learn the not-sent status of m at the end of round r — 2. Since ?n, is a 
message of round r — 3, p would label all messages from q in round r — 2 as not-sent. 

Claim 6. by the end of round r — 2, agent p has learned the status of all messages of round r — 3 or 
earlier. 

Proof of Claim 6. Consider an arbitrary message m from x to y in round r — 3. If p receives a message 
from X or y in round r — 2, then p would learn the status of m. Suppose that p does not receive messages 
from X and y in round r — 2. Then by Claim 5 p would label all messages from x and y in round r — 2 as 
not-sent. According to rule 3(a) of Alg-MsgGraph, in this case p would label m as never-known. By 



Lemma 12 p would learn the labels of all messages by round r — 3. 

Claim 7. In run R, at the end of round r — 2,p sets itself as the dictator. 

Proof of Claim 7. By Claim 1, p must be on the dictator chain. If p is the first on the dictator chain. 



the claim is trivially true. If not, let d be the dictator before p in the dictator chain. According Lemma 17 
there exist a message m' of round r' from d to p, such that p is alive at the end of round r' and d fails to 
send m! to p. Since p crashes in round r — 1, we know that r' < r — 2. If r' = r — 2, then by Claim 
5, p would have labeled some message from d in round r — 3 as not-sent, which contradicts condition (d) 



of Lemma 17 If r' < r — 3, by Claim 6, p learns the status of all messages by round r' . According to 
Algorithm Alg-Dictator, p should have changed the dictator from d to p at the end of round r — 2. 

Claim 8. Agent p would send the NEWEPOCH(i;p) messages to all agents in round r — 1. 

Proof of Claim 8. If agent p knows that he is the dictator by round r — 3, then p would decide at the end 
of round r — 2, and no one can change the decision any more. Thus, p must know his dictatorship at the end 
of round r — 2. According to Algorithm Alg-Dictator, p would send the NEWEPOCH(t;p) messages to 
all agents in round r — 1. 

Claim 9. Agent i crashes in run R. 

Proof of Claim 9. Suppose that i does not crash. By Claim 8, i does not receive the NEWEPOCH(i;p) 
message from p in round r — 1. If in run R agent i does not become a dictator after p, then there is no effect 
for i to fake the message from p to i in round r — 1. If i indeed becomes the next dictator, since i does not 
crash, i would decide on his most preferred value, and thus i will not benefit from cheating. Thus, i must 
eventually crash in run R. 

Claim 10. Agent p must be an honest agent. 

Proof of Claim 10. If p is also a cheater, then together with Claim 9 we know that both cheaters p and 
i crash in run R. Since we only have two cheaters, and by our definitions cheaters' utility is fixed to zero 
in failure patterns in which they crash, in run R no cheater will benefit, contradicting the definition of R. 
Note that this is the place where we use the condition of c = 2. If c = 3, p actually could be a cheater, and 



manipulation behavior exists (see an example in Section 3.2. 1 1. 

We are now ready to reach the final contradiction. Consider a run R! , such that (a) R' is the same as run 
R up to round r — 2; (b) in round r — 1 agent p successfully sends his MsgGraph and NEWEPOCH(?;p) 
messages to all agents but i; (c) in round r the other cheater (if exists) crashes without sending out any 
messages; and (d) i does not crash in R' . At the end of round r — 1, the message history of i is the same in 
two runs, so i could still cheat in R! as in R. Let i do so in R' . 

Even though we crash the other cheater in R! , run R! is still a run with at most / crash failures, because 
by Claim 9 i crashes in R but i is correct in R' . In R! , agent i does not receive Vp directly from p, and he will 
not receive Vp from any other honest agents according to the protocol, and he will not receive Vp from the 
other cheater since the other cheater crashes at the beginning of round r. By Claim 10 agent p is an honest 
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Figure 2: Counter-example of Alg-NewEpoch when / = n — 1. 

agent. Therefore, in R! i will not know the value of Vp. 

In R' no message from p in round r — 1 can be labeled not-sent by any agent other than i, and i himself 
cheats this label to be sent. According to the algorithm, in this case the final decision in run R' must be Vp, 
the most preferred value of p. However, i does not receive Vp from any agent, and thus he cannot decide, a 
contradiction. D 

Theorem 2 Protocol Alg-NewEpoch is a (2, f)-resilient consensus protocol for any f < n — 2. 



Proof. The result is directly obtained from Lemmas 27 28 29 and 30 



D 



3.1.5 Counter-example for Alg-NewEpoch when f = n — 1 

Figure |2] describes an counter-example Alg-NewEpoch when / = n — 1. To make the example clear, it 
only shows the messages sent by pi in round 1, messages sent by p2 in round 2 and messages sent or received 
by p5 in round 3. Here, p^ and p^ are cheaters. In round 1, the first dictator fails to send NEWEPOCH to 
P2, but p2 only tells p^ about it in round 2 and then crashes. Let m be the message from pi to p2 in round 
1. Then, in round 3, p^ sets m to be not-sent and knows that ps sets it to be uncertain in the end of round 2. 
Note that p^ does not know the status of m in ps's MsgGraph in the end of round 3 since p^ does not know 
whether p^ sends message to p^ in round 3. In round 4, p^ pretends a crash and does not send any message 
to p3. There are several cases for ps to consider: 

1. In round 4, p^ sends a message to p^ and tells p^ that p^ does not send message to p^ in round 3. Then 
P5 decides pi's proposal value. In this case p^ can benefit if pi's proposal value is better than p^s 
proposal. 

2. In round 4, ps sends a message to ps and tells ps that p^ sends a message to p3 in round 3, and p^ does 
not send message to ps in round 4. Then, ps must send a NEWEPOCH message with his proposal in 
round 4 since ps knows all status of messages sent by p2 in round 2. Then, p^ learns the proposal of 
Ps and decides ps's proposal value. Note that a key point here is that even though p^ pretends a crash 
at the beginning of round 4, he is still able to receive the round-4 message from p^, which contains 
the critical information of ps's proposal value. 
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3. In round 4, both ps and p^ send messages to p^. By the same argument, p^ must send NEWEPOCH 
message, ps waits until round 5. If p4 sends message to p^ and tells it ps fails to send NEWEPOCH 
message to p4 in round 4, then ps decides p^s proposal value, otherwise p^ decides ps's proposal 
value. Note that it requires p^ to be a cheater, otherwise p4 will not send messages to p5 when it 
knows that p^ has crashed. 

3.2 Alg-NewEpoch2: Deterministic (2, /)-resilient consensus protocol for f < n — 1 



Alg-NewEpoch is not (2,n — 1) resilient as shown by the counter-example in Section 3.1.5 because 
simply counting the number of crash failures cannot deter the manipulation of pretending crash failures any 
more. We adapt Alg-NewEpoch to a new protocol Alg-NewEpoch2 to deal with this issue. The only 
difference in the new protocol is the following: When the dictator finds that he can send the NEWEPOCH 
message, he splits his most preferred value into two parts and sends them in two consecutive rounds with two 
NEWEPOCH messages separately, and he decides the value by the end of the second round. Other agents 
can recover the dictator's most preferred value if and only if he knows the content of both NEWEPOCH 
messages. Note that here we assume that the proposal values need at least two bits to represent, which is 
consistent with our assumption that \V\ > 3. 

The above change, together with the requirement that agents stop sending messages to crashed agents, 
successfully guards against pretending crash manipulations. Intuitively, the risk when a cheater pretends a 
crash is that he may miss the NEWEPOCH message from a new dictator and thus cannot decide on the 
most preferred value of the new dictator. In Alg-NewEpoch it is possible that the cheater receives this 
NEWEPOCH message in the same round as he pretends a crash, making him safe. However, in Alg- 
NewEpoch2, the most preferred values are split into two parts, and our analysis shows that the cheater 
would miss the second part if he pretends a crash, effectively defeating this cheating behavior. 

Lemma 31 Suppose that I < f < n — 1. In Algorithm AlG-NewEP0CH2, if no cheater fakes any 
messages, then no group of agents of any size can strategically manipulate the protocol by some cheater 
pretending a crash. 



Proof. First, when / < n — 1, Lemma 29 can be applied to Alg-NewEpoch2 with the same proof, and 
thus we only consider the case of / = n — 1. 

Suppose, for a contradiction, that there exist a failure pattern F in which some agent p can strategically 
manipulate the protocol by pretending a crash in round r. Assume that agent p does not send message to 
some honest agent h in round r and then stop sending messages to all honest agents in all rounds after r. 

Let dbe the last agent from which agent h has received a NEWEPOCH message before round r (includ- 
ing himself), and let k be the round in which d sends its first round NEWEPOCH message. Thus k <r — l. 
If there is no such agent, let A; = 0. Now we focus on the message status of all NEWEPOCH messages 
send from agent d in round k and k + 1 (i.e., all NEWEPOCH messages sent from d), and consider the 
following scenarios (the scenarios listed below may overlap, but they cover all possible cases): 

\. d = h. \f h has finished sending his NEWEPOCH messages by round r — 1, then h will decide 
in round r and no one can change the decision. If h is still sending his NEWEPOCH messages in 
round r, then only the status of these NEWEPOCH messages may cause the change of dictator. Thus, 
whether agent p sent h a message or not in round r cannot affect the final decision. In this case, we 
let p send h a message in round r and reconsider the scenario. 
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2. k = 0. This means agent h has never received a NEWEPOCH message from any agents (including 
himself). Now consider the following failure pattern F': in the first r — 1 rounds, F' is the same as F. 
Then all agents except p and h crash at the beginning of round r, without sending out any messages. 
And agent p and h are correct agents. Because F' is consistent with MsgGraphp^_x for agent p at the 
end of round r — 1, which means p should pretend to crash in F' as he does in F. Since agent h will 
never receive any NEWEPOCH messages from other agents in F', she will eventually send out her 
own NEWEPOCH messages in some round r' and r' + 1 (with r' > r) and then decide on her own 
most preferred value. Notice that agent p does not send h any message in round r, thus h will not send 
p any message in any round later than r. This means p can never receive the NEWEPOCH messages 
from agent h in round r' + 1, nor can he get this information from other agents (because they are all 
crashed at the beginning of round r). Therefore, p does not know what value does agent h decide on 
in this case and thus are not able to cheat. 

Note that this is the case where we require that the most preferred value of h be splitted into two 
rounds r' and r' + 1, since we can only guarantee that p does not receive the second part of the value 
in round r' + 1. If /i were to send its entire proposal in round r', then in the case of r' = r, h would 
send his proposal to p in round r since h has not detected that p has crashed, and p's cheating would 



be successful. This is exactly the case shown in the counter-example in Section 3.1.5 



3. A; = r — 1, or there exists agent g G 11 which is alive at the end of round r — 1, and some NEWEPOCH 
message m sent from d is labeled as not-sent in MsgGraphg^_^. In the following we show that in this 
case, there always exists a NEWEPOCH message m' from agent d and a failure pattern F' consistent 
with MsgGraphp^_^ for p at the end of round r — 1, such that MsgGraph;j^(m,') = not-sent in F'. 
If this is true, we crash all other agents except p and h at the beginning of round r + 1 and apply a 
similar argument as in Case [2] to show that h will be the final dictator but p does not know the most 
preferred value of h. Thus, p cannot cheat in this case. 

We consider the following subcases: 

(a) k = r — 1. In this case, let F' be a failure pattern (consistent with MsgGraph^ j._i for p at the 
end of round r — 1) such that d crashes in round r and does not send a message to h. Then in F' 
h will label the message from d in round r as not-sent. 

(b) q y^ p. Let F' be the failure pattern in which agent q send the status of m to /i in round r. Thus 
we have MsgGraph^j ^(m) = not-sent in the run with failure pattern F'. 

(c) q = p and agent p learns the status of m before round r — 1. Then p must have sent this infor- 
mation to agent h in round r — 1, and agent h should also labels m as not-sent in MsgGraph^ ^ 
in failure pattern F. Note that we assume that p does not fake any messages in the run. 

(d) q = p and agent p learns the status of m from some agent q' y^ d in round r — 1. Then there 
exist a failure pattern F' consistent with MsgGraphp^, in which q' also sent this information to 
agent h in round r — 1. Hence we have MsgGraph^^(?72) = not-sent in F'. 

{&) q = p and agent p is the (supposed) receiver of this NEWEPOCH message m and does not 
receive it from agent d in round r — 1. In this case, p does not know whether agent d has sent 
the NEWEPOCH message to h successfully, which means there is a consistent failure pattern 
F' , in which agent h does not receive the NEWEPOCH message from d neither. 

4. k < r — \, and for any agent q that is alive at the end of round r — 1 and any NEWEPOCH message 
m sent from d, MsgGraphg^_]^(?7i) ^ not-sent. In this case, by Lemma 15 we know that no agent 
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Figure 3: Counter-example for Alg-NewEpoch2 when c = 3. 

can label any of agent d's NEWEPOCH message as not-sent in any later rounds. Thus according to 
algorithm, the final decision value will be agent d's most preferred value regardless of whether agent 
p crashes. Hence agent p cannot benefit by pretending a crash. 

Above are all the possible cases. And we showed that in neither of them can agent p cheat. This finishes 
the proof. D 

Theorem 3 The protocol Alg-NewEpoch2 is a (2, /) resilient consensus protocol for any f < n — 1. 



Proof. First, Lemma 28 can be applied to Alg-NewEpoch2 with the same proof, which means in 



Alg-NewEpoch2 some cheater must either pretend a crash or fake a message in order to benefit. Second, 



Lemma 30 can also be applied to Alg-NewEpoch2 with the same proof, which means no cheater can 



fake any messages. Finally, Lemma 3 1 states that no cheater can pretend crashes when cheaters do not fake 
messages. Together, we show that cheaters have no valid cheating actions, and thus the statement of theorem 
holds. D 



3.2.1 Counter-example for Alg-NewEpoch when c = 3 

Figure[3]describes a possible cheating example for protocol Alg-NewEpoch2 when there are 3 colluders. 
Let the colluders be pi,p2 and p^. Suppose in round 1, no agent crashes and agent 1 successfully sends his 
first round NEWEPOCH messages to all other agents. In round 2, pi crashes and fails to send his second 
round NEWEPOCH message only to p2. No other agents crash in round 2. Then in round 3 and later 
rounds, p2 can fake the message pi sent to him in round 2. It can be verified that no agent will detect any 
inconsistency and the final decision value of the system will always be pi's most preferred value if p2 cheats 
in this way. In the following, we describe a scenario where the colluders actually benefit from this cheating 
behavior. 

Consider the failure pattern in which p2 crashes at the beginning of round 4, without sending out any 
messages. Agent ps and p^ are correct agents. According to algorithm, if all agents are honest, the final 
decision value should be ps's most preferred value. But as we discussed above, iip2 fakes the message that 
pi sent to him in round 2, the final decision will be pi's most preferred value, and thus p^ can benefit from 
this cheating behavior, since p^s most preferred value is the same as pi's. 
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We need three coUuders here because of the following two reasons. First, when p2 fakes the receipt of 
message sent by pi in round 2, he runs the risk that this causes the final decision to be pi's most preferred 
value, and thus if pi were an honest agent, p2 would not know his most preferred value and cannot decide 
(if his colluder also crashes before telling him the value). Hence both p2 and pi have to be cheaters. Second, 
for cheaters to benefit, p2 has to crash, since otherwise p2 would be the next dictator and there is no need to 
cheat. Since both pi and p2 have crashed in the run, there has to be the third cheater left to take the benefit. 

This example shows a scenario that a colluder has to crash in order for other cheaters to benefit. This 
shows the subtlety involved when our model allows crashes on coUuders. 

3.3 Alg-RandNewEpoch2: Randomized {n — 1, /)-resilient consensus protocol for any 

f<n-l 

Alg-NewEpoch2 is not resilient to three colluded cheaters as shown by a counter-example in Sec- 
tion 3.2.1 because the coUuders could successfully manipulate the protocol through faking message re- 
ceipts. In this section, we show that if we allow agents to use randomness in their algorithms, we can boost 
the protocol to resist n — 1 coUuders. 

A randomized protocol is one in which every agent has access to random bits as part of his local state. 
A randomized consensus protocol is (c, /)-resilient if it solves consensus in a system with at most / crash 
failures regardless of random bits used by agents, and any strategic manipulation by at most c coUuders 
would lead to violation of consensus with high probability. 

The randomized algorithm Alg-RandNewEpoch2 is a further adaptation of Alg-NewEpoch2. 
Alg-RandNewEpoch2 maintains the same structure of Alg-NewEpoch2, except that messages are 
associated with random numbers as follows. Every message is associated with a random number created 
by the message sender. The MsgGraphj of agent i keeps track of the random number of every message 
that i knows. When i sends a message m to j in a round r, he first generates a copy MG of MsgGraphj 
in which all random numbers associated with messages he sends are removed. Then he generates a unique 
random number pm associated with m, and sends MG and pm together to j. When i receives a message 
(MsgGraph , /9p) from an agent p in round r, i records pp in its MsgGraph. The consistency check in 
Alg-Consistency is revised as follows. When agent i simulates any message (MsgGraphp, pp) sent by 



an agent p (line 10 of Alg-Consistency), ii p = i then i uses the original random number he generated 
in the real run as Pp', if i has received the random number from p in the real run, he uses the received number 
as Pp-, otherwise, he leaves pp as _L. Finally, we add an additional round at the beginning for agents to 
exchange MsgGraph's and random numbers but no NEWEPOCH messages. Using randomness effectively 
stops faking message receipts with high probability because cheaters do not know the random bits used in 
advance. 

Theorem 4 Protocol Alg-RandNewEpoch2 is a randomized (n — 1, f)-resilient consensus protocol for 

any / < n — 1. 

Proof. If all agents are honest, it is obvious that no agent can find any inconsistency about the random 
number attached with each message. Since the remaining part of the new algorithm is the same as the old 



algorithm. Hence Lemma 27 still holds here, i.e., the new algorithm solves consensus problem if all agents 
are honest. 



Using the same argument, it can be verified that Lemma 28 and Lemma 31 also hold for the new algo- 
rithm. Hence in order to prove that no group of agents can strategically manipulate the system, it is adequate 
to show that no agent can fake any messages in the system. 
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Suppose that a cheater p does not receive a message from an agent q in some round r. If r = 1, which 
means this is the extra round that we added to the algorithm, then whether faking this message or not will 
not effect the outcome of the algorithm. If r > 1, pick an honest agent h that is still alive at the end of 
round r. Let m be the message that h sends to q in round r — 1 and let p be the random number h associates 
with message m. Note that if p wants to fake the message that q sends to p in round r, in p's round r + 1 
message to h, p has to include p as the random number associated with message m, otherwise h will notice 
this inconsistency and decide on T. However, by the end of round r, agent p does not know value p, which 
means with high probability he cannot fake this message. This completes the proof. D 

3.3.1 An example that (c, /)-resiliency does not imply (c, /') -resiliency for f < f 

We use protocol Alg-RandNewEpoch2 to show an example that (c, /)-resiliency does not imply (c, /')- 
resiliency for /' < /. To be clear we use Alg-RandNewEpoch2 (/) to denote the actual protocol with 
parameter /. Consider a system of 5 agents, c = 3, / = 4, and /' = 2. Agents 2, 4 and 5 are coUuders. 
Theorem |4] shows that Alg-RandNewEpoch2 (4) is (3, 4)-resilient in this system. Note that with / = 4, 
the second condition in line[4]of Alg-Consistency is always true and thus useless. We show that Alg- 
RandNewEpoch2 (4) is not (3, 2)-resilient by providing the following manipulation scenario. 

Consider a system with at most /' = 2 crash failures, and /' is common knowledge to all agents. When 
agent 2 fails to receive a message from agent 1 in round 1, agent 2 immediately pretend a crash in round 2 
without sending out any messages. We argue that this manipulation is safe to the coUuders. First, pretending 
a crash will not be detected by consistency check. Second, since /' < c = 3, and agent 1 already crashes, 
agent 2 is sure that one of the remaining coUuders 4 or 5 must be alive, in which case agent 2 can always get 
the decision value from the alive coUuder. Therefore, consensus can always be achieved. 

We now describe a scenario where coUuders benefit. Notice that agent 1 is the first dictator in the run. 
Suppose that agent 1 successfully sends his round- 1 messages to all other agents except agent 2 before agent 
1 crashes in round 1. If agent 2 pretends a crash at the beginning of round 2, all other agents would eventually 
label the message from agent 1 to agent 2 in round 1 as never-known. According to our algorithm, all other 
messages from agent 1 in round 1 will be labeled either as sent or never-known, and all agents eventually 
decide on agent I's most preferred value vi. However, if agent 2 follows the protocol, and there is no crash 
failure in round 2, at the beginning of round 3 agent 2 would become the new dictator and start sending his 
NEWEPOCH messages. In this case, if agent 2 crashes in round 3 without sending a message to agent 3, 
and this is the last crash failure in the run, agent 3 would become the final dictator and the decision value of 
the run would be agent 3's most preferred value v^. If coUuders 4 and 5 prefers vi over ^3, then they would 
benefit from agent 2 pretending the crash in round 2. 

Therefore, we have that Alg-RandNewEpoch2 (4) is (3, 4)-resiUent but not (3, 2)-resiUent. The key 
is that, when there are more possible failures, agent 2 who pretends a crash has the risk that all his partners 
crashes and he cannot get the final decision value in all cases, but when the number of possible failures 
decreases, he does not have this risk any more. 

3.4 Protocol complexity and summary of collusion resistance techniques 

We now discuss the complexity of the protocols, and then summarize a number of techniques we used in 
defending against strategic manipulations used in our protocols. 

Protocol complexity. Let /' < / be the actual number of crashes in a run. For Alg-NewEpoch, each 
crash failure delays the decision for at most 2 rounds by causing a change of dictatorship. Thus, it takes at 
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most 2/' + 1 rounds for all agents to decide and 2/' + 2 rounds for all agents to terminate the algorithm. 
For Alg-NewEpoch2, each crash delays the decision for at most 3 rounds since it sends two rounds of 
NEWEPOCH messages. Its round complexity is therefore 3/' + 2 for decision and 3/' + 3 for termination. 
Alg-RandNewEpoch2 only needs one more round than Alg-NewEpoch2. For message complexity, 
at most 0{'n?f') messages are exchanged, with the size of each message at most 0{n? f) due to the size of 
MsgGraph. It is also easy to check that local computation on each agent is polynomial in n and /'. 

Summary on techniques for resisting strategic manipulations. Our protocols employ a number of tech- 
niques defending against strategic manipulations, which may find applications in other situations. 

Consistency check combined with a punishment strategy (deciding T in our case) builds the first line 
of defense. It effectively restricts the possible manipulations of a cheater. A particular form of consistency 
check is to count the number of observed failures and check if it exceeds the maximum number of possible 
failures /. When / < n — 1, this check stops agents from pretending a crash failure, one of the important 
forms of strategic manipulations. 

However, consistency check is far from enough. For example, when / = n — 1, pretending crash failures 
cannot be detected. In this case, we use the techniques of not sending any messages to a crashed agent and 
splitting critical information (the most preferred proposal in our case), to achieve the effect that cheaters 
may risk not receiving the critical information should they pretend a crash. 

Besides pretending crash manipulations, we know from our analysis that another important form of 
manipulation is for a cheater to fake a message that he does not receive. When sufficient random bits are 
available to all agents, we can let all agents to attach a unique sequence of random bits to each message, 
and by checking the random bits received as part of consistency checks, faking messages can be prevented. 
However, when random bits are not available, defending against faking messages is much more difficult, 
especially when there are colluding agents. Our protocol Alg-NewEpoch combines several techniques to 
guard against faking message manipulations. One technique is the maintenance of consistent message status 
through the message graph exchange component Alg-MsgGraph. The second technique is carefully 
designed conditions for claiming new dictatorship, in particular, the new dictator is selected among agents 
who detect the failures of the old dictator, and this reduces the incentive of faking message receipts from the 
old dictator. 

While our protocols are designed specifically for solving the problem of synchronous consensus, we 
believe that the above mentioned techniques could be potentially used in other distributed protocol design 
for defending against strategic manipulations. 

4 Impossibility of resisting colluders with private communications 

In this section, we consider a modified synchronous round model in which colluders can communicate with 
one another through private communication channels after one synchronous round ends but before the next 
round starts. These private channels provide new opportunities for colluders to manipulate the protocol. 
For example, if cheater i receives a message from p but cheater j does not receive a message from p in 
the same round r, i and j through their private communication would know that p crashes in this round, 
and thus in round r + 1 it is safe for i to pretend not receiving a message from p in round r, a case not 
feasible without private communication. Indeed, our theorem below shows that with private communication 
no (2, /)-resilient consensus protocol exists for any 1 < / < n — 1, even with randomness. 
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Theorem 5 In a synchronous system with private communication channels among colluders, there is no 
randomized (2, f)-resilient consensus protocol with n > 3 agents, for any 1 < f < n — 1. 

Proof. Suppose, for a contradiction, that such a protocol exists. By Theorem[T]for any failure pattern F the 
protocol has a dictator under F. 

Suppose without loss of generality that agent 1 is the dictator in the failure-free run. Now if change the 
failure pattern by deleting the messages sent by agent 1 one by one, round by round starting from the final 
round before agent 1 decides, there must exist a failure pattern F where the dictator is still agent 1, but if 
we delete (any) one more message sent from agent 1, the dictator will change. Then in failure pattern F, let 
round r be the last round in which agent 1 has sent at least one message. We consider two cases here: 

(1) Agent 1 sends only one message in round r. Suppose agent 1 sends a message only to agent 2 in 
round r. 

First we show that if this message is removed, the new dictator can only be agent 2. Assume otherwise 
that the dictator becomes another agent, say agent 3. Then we consider the case that agents 1 and 2 
are in the colluding group. If in round r agent 2 does not receive a message from agent 1, he could 
cheat by pretending that he has received this message. In this case, the algorithm will choose agent 1 
as the dictator instead of agent 3, which will benefit agent 2. 

Now we consider another case that agent 2 and some other agent, say agent 3, are in the colluding 
group. Notice that agent 2 receives a message from agent 1 but agent 3 does not receive a message 
from agent 1 in round r, thus through their private communication at the end of round r, both agent 2 
and agent 3 will know that agent 1 crashes in this round. Hence agent 2 could pretend that he did not 
receive the message from agent 1 in round r. In this case, the dictator will become agent 2 instead of 
agent 1, which benefits agents 2 and 3. 

(2) Agent 1 sends more than one messages in round r. Suppose that agent 1 has successfully sent mes- 
sages to agent 2 and 3 in round r. 

By a similar argument as the previous case, we can prove that if we remove the message sent from 
agent 1 to agent 2 (or agent 3), the new dictator will become agent 2 (or agent 3). Now we remove 
the messages sent from agent 1 to both agent 2 and agent 3. Suppose in this case the new dictator is 
some agent i, and without loss of generality suppose that i / 3. Now we let agent 1 and agent 2 be 
in the colluding group. Then consider the failure pattern F' in which agent 1 has successfully sent 
a message to agent 2 but not agent 3. Agent 2 can pretend that he did not receive the message from 
agent 1 in round r, because he will know through the private channel at the end of round r that agent 
1 crashes at round r. In this case, the dictator will be agent i instead of agent 3, which could benefit 
agent 2 if 2 prefers i's most preferred value over agent 3's. 

D 

5 Conclusion and future directions 

In this paper, we propose new protocols that are resilient to both crash failures and strategic manipula- 
tions. We argue that combining crash failures with strategic manipulations is an interesting research area 
addressing both practical scenarios and enriching the theory of fault-tolerant distributed computing. 

There are many open problems and research directions one can look into. First, with the problem setting 
of this paper, several interesting open problems are left to be explored: (a) whether a deterministic protocol 
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resisting three or more colluders exist; and (b) whether the gap between our round complexity (2/ + 2 or 
3/ + 4 depending on the cases) and the round complexity of standard protocols without self agents (/ + 1) 
can be closed, or there is an intrinsic cost in tolerating manipulations on top of crash failures. Going beyond 
the setting of this paper, one can look into other utility functions such as message transmission costs, other 
distributed computing tasks, other distributed computing models such as asynchronous or shared memory 
systems, or other type of failures such as omission failures. We wish that our work would stimulate other 
researchers to invest in this emergent area of incentive-compatible and fault-tolerant distributed computing. 
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